Re: [PATCH] mm: Fix false softlockup during pfn range removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.06.20 01:12, Ben Widawsky wrote:
> When working with very large nodes, poisoning the struct pages (for
> which there will be very many) can take a very long time. If the system
> is using voluntary preemptions, the software watchdog will not be able
> to detect forward progress. This patch addresses this issue by offering
> to give up time like __remove_pages() does.  This behavior was
> introduced in v5.6 with:
> commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")

We try to stay <= 72 chars in the commit message (except for kernel splats).

> 
> Alternately, init_page_poison could do this cond_resched(), but it seems
> to me that the caller of init_page_poison() is what actually knows
> whether or not it should relax its own priority.
> 
> Based on Dan's notes, I think this is perfectly safe:
> commit f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}")

We shouldn't be holding any spin locks, so it's safe. (we could think
about doing this outside of the memory hotplug lock in the case of
devmem, however, it's only a debugging feature so we most probably don't
care).

> 
> Aside from fixing the lockup, it is also a friendlier thing to do on
> lower core systems that might wipe out large chunks of hotplug memory
> (probably not a very common case).

It really only is an issue for devmem. Ordinary hotplugged system memory
is not affected (onlined/offlined in memory block granularity).

> 
> Fixes this kind of splat:
> [  352.142079] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [daxctl:9922]
> [  352.150067] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat kmem intel_rapl_msr intel_rapl_common rpcrdma sunrpc rdma_ucm ib_iser isst_if_common rdma_cm skx_edac iw_cm ib_cm x86_pkg_temp_thermal libiscsi intel_powerclamp scsi_transport_iscsi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i40iw intel_cstate iTCO_wdt ib_uverbs iTCO_vendor_support device_dax ib_core joydev intel_uncore i2c_i801 lpc_ich ipmi_ssif ioatdma dca wmi ipmi_si ipmi_devintf ipmi_msghandler dax_pmem
> [  352.150123]  dax_pmem_core acpi_pad acpi_power_meter drm ip_tables xfs libcrc32c nd_pmem nd_btt i40e e1000e crc32c_intel nfit
> [  352.260774] irq event stamp: 138450
> [  352.264692] hardirqs last  enabled at (138449): [<ffffffffa1001f26>] trace_hardirqs_on_thunk+0x1a/0x1c
> [  352.275134] hardirqs last disabled at (138450): [<ffffffffa1001f42>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  352.285662] softirqs last  enabled at (138448): [<ffffffffa1e00347>] __do_softirq+0x347/0x456
> [  352.295233] softirqs last disabled at (138443): [<ffffffffa10c416d>] irq_exit+0x7d/0xb0
> [  352.304210] CPU: 46 PID: 9922 Comm: daxctl Not tainted 5.7.0-BEN-14238-g373c6049b336 #30
> [  352.313283] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0578.D07.1902280810 02/28/2019
> [  352.324308] RIP: 0010:memset_erms+0x9/0x10
> [  352.328905] Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> [  352.349953] RSP: 0018:ffffc90021b5fd50 EFLAGS: 00010202 ORIG_RAX: ffffffffffffff13
> [  352.358450] RAX: 00000000000000ff RBX: ffff88983ffd5000 RCX: 0000000065df0e40
> [  352.366457] RDX: 00000003a0000000 RSI: 00000000ffffffff RDI: ffffea039b20f1c0
> [  352.374465] RBP: ffff88983ffd6c00 R08: 0000000000000000 R09: ffffea0061000000
> [  352.382473] R10: 0000000000000001 R11: 0000000000000000 R12: ffffea006f808000
> [  352.390480] R13: 0000000001840000 R14: 000000000e800000 R15: ffff8997d7b764e0
> [  352.398487] FS:  00007f724ef81780(0000) GS:ffff8997df100000(0000) knlGS:0000000000000000
> [  352.407562] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  352.414011] CR2: 00007ffcd4070768 CR3: 000001178c722002 CR4: 00000000003606e0
> [  352.422056] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  352.430092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  352.438115] Call Trace:
> [  352.440865]  remove_pfn_range_from_zone+0x3a/0x380
> [  352.446244]  ? cpumask_next+0x17/0x20
> [  352.450361]  memunmap_pages+0x17f/0x280
> [  352.454670]  release_nodes+0x22a/0x260
> [  352.458888]  __device_release_driver+0x172/0x220
> [  352.464070]  device_driver_detach+0x3e/0xa0
> [  352.468753]  unbind_store+0x113/0x130
> [  352.472868]  kernfs_fop_write+0xdc/0x1c0
> [  352.477273]  vfs_write+0xde/0x1d0
> [  352.482218]  ksys_write+0x58/0xd0
> [  352.487207]  do_syscall_64+0x5a/0x120
> [  352.492529]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> [  352.499446] RIP: 0033:0x7f724f40b5e7
> [  352.504673] Code: Bad RIP value.
> [  352.509484] RSP: 002b:00007ffcd40738f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [  352.519213] RAX: ffffffffffffffda RBX: 00007f724ef816a8 RCX: 00007f724f40b5e7
> [  352.528410] RDX: 0000000000000007 RSI: 00005617d7cd1277 RDI: 0000000000000003
> [  352.537573] RBP: 0000000000000003 R08: 00000000ffffffff R09: 00007ffcd40737d0
> [  352.546764] R10: 0000000000000000 R11: 0000000000000246 R12: 00005617d7cd1277
> [  352.555929] R13: 0000000000000000 R14: 0000000000000007 R15: 00005617d7cd1230
> [  370.353742] Built 2 zonelists, mobility grouping on.  Total pages: 49050381
> [  370.373317] Policy zone: Normal
> [  374.948164] Built 3 zonelists, mobility grouping on.  Total pages: 49312525
> [  375.017496] Policy zone: Normal

I'd have stripped this to a reasonable minimum.

> 
> Fixes: commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()")
> Reported-by: "Scargall, Steve" <steve.scargall@xxxxxxxxx>
> Reported-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx>
> Signed-off-by: Ben Widawsky <ben.widawsky@xxxxxxxxx>
> ---
>  mm/memory_hotplug.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9b34e03e730a..da374cd3d45b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -471,11 +471,20 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
>  				      unsigned long start_pfn,
>  				      unsigned long nr_pages)
>  {
> +	const unsigned long end_pfn = start_pfn + nr_pages;
>  	struct pglist_data *pgdat = zone->zone_pgdat;
> -	unsigned long flags;
> +	unsigned long pfn, cur_nr_pages, flags;
>  
>  	/* Poison struct pages because they are now uninitialized again. */
> -	page_init_poison(pfn_to_page(start_pfn), sizeof(struct page) * nr_pages);
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += cur_nr_pages) {
> +		cond_resched();
> +
> +		/* Select all remaining pages up to the next section boundary */
> +		cur_nr_pages =
> +			min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn);

Nit: I'd have used the same indentation as the code you copied this from.

> +		page_init_poison(pfn_to_page(pfn),
> +				 sizeof(struct page) * cur_nr_pages);
> +	}
>  
>  #ifdef CONFIG_ZONE_DEVICE
>  	/*
> 

Thanks!

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

-- 
Thanks,

David / dhildenb




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux