RE: fs: avoid softlockups in s_inodes iterators commit

David Mozes <david.mozes@xxxxxxx> · Wed, 17 Mar 2021 16:40:15 +0000

Sure Eric,
Send details again 
I run a very high load traffic (Iscsi storage-related IO load )on GCP.
After one day of running, my kernel has been stack with two typical cases involving page fault.
1)	Soft lockup, as described in the first typical case, 
2)	Panic as described in the second case.

First typical case: (the soft lockup happens on several CPUs):

Feb 21 07:38:52 c-node02 kernel: [242408.563170]  ? flush_tlb_func_common.constprop.10+0x250/0x250
Feb 21 07:38:52 c-node02 kernel: [242408.563171]  on_each_cpu_mask+0x23/0x60
Feb 21 07:38:52 c-node02 kernel: [242408.563173]  ? x86_configure_nx+0x40/0x40
Feb 21 07:38:52 c-node02 kernel: [242408.563174]  on_each_cpu_cond_mask+0xa0/0xd0
Feb 21 07:38:52 c-node02 kernel: [242408.563175]  ? flush_tlb_func_common.constprop.10+0x250/0x250
Feb 21 07:38:52 c-node02 kernel: [242408.563177]  flush_tlb_mm_range+0xbc/0xf0
Feb 21 07:38:52 c-node02 kernel: [242408.563179]  ptep_clear_flush+0x40/0x50
Feb 21 07:38:52 c-node02 kernel: [242408.563180]  try_to_unmap_one+0x2ae/0xae0
Feb 21 07:38:52 c-node02 kernel: [242408.563184]  ? mutex_lock+0xe/0x30
Feb 21 07:38:52 c-node02 kernel: [242408.563186]  rmap_walk_anon+0x13a/0x2c0
Feb 21 07:38:52 c-node02 kernel: [242408.563188]  try_to_unmap+0x9c/0xf0
Feb 21 07:38:52 c-node02 kernel: [242408.563190]  ? page_remove_rmap+0x330/0x330
Feb 21 07:38:52 c-node02 kernel: [242408.563192]  ? page_not_mapped+0x20/0x20
Feb 21 07:38:52 c-node02 kernel: [242408.563193]  ? page_get_anon_vma+0x80/0x80
Feb 21 07:38:52 c-node02 kernel: [242408.563195]  ? invalid_mkclean_vma+0x20/0x20
Feb 21 07:38:52 c-node02 kernel: [242408.563196]  migrate_pages+0x3cd/0xc80
Feb 21 07:38:52 c-node02 kernel: [242408.563197]  ? do_pages_stat+0x180/0x180
Feb 21 07:38:52 c-node02 kernel: [242408.563198]  migrate_misplaced_page+0x15e/0x270
Feb 21 07:38:52 c-node02 kernel: [242408.563200]  __handle_mm_fault+0xd80/0x12f0
Feb 21 07:38:52 c-node02 kernel: [242408.563202]  handle_mm_fault+0xc2/0x1f0
Feb 21 07:38:52 c-node02 kernel: [242408.563204]  __do_page_fault+0x23e/0x4f0
Feb 21 07:38:52 c-node02 kernel: [242408.563206]  do_page_fault+0x30/0x110
Feb 21 07:38:52 c-node02 kernel: [242408.563207]  page_fault+0x3e/0x50
Feb 21 07:38:52 c-node02 kernel: [242408.563209] RIP: 0033:0x7f27fffb9e73
Feb 21 07:38:52 c-node02 kernel: [242408.563211] Code: 89 6d e8 48 89 fb 4c 89 75 f0 4c 89 7d f8 49 89 f6 4c 89 65 e0 48 81 ec c0 06 00 00 4c 8b 3d 3c a1 34 00 49 89 d5 64 41 8b 07 <89> 85 dc fa ff ff 8b 87 c0 00 00 00 85 c0 0f 85 b9 01 00 00 c7 87
Feb 21 07:38:52 c-node02 kernel: [242408.563211] RSP: 002b:00007f12a37fda10 EFLAGS: 00010202
Feb 21 07:38:52 c-node02 kernel: [242408.563213] RAX: 0000000000000000 RBX: 00007f12a37fe0e0 RCX: 0000000000000000
Feb 21 07:38:52 c-node02 kernel: [242408.563214] RDX: 00007f12a37fe200 RSI: 00000000017a9453 RDI: 00007f12a37fe0e0
Feb 21 07:38:52 c-node02 kernel: [242408.563214] RBP: 00007f12a37fe0d0 R08: 0000000000000000 R09: 00000000017c7550
Feb 21 07:38:52 c-node02 kernel: [242408.563215] R10: 0000000000000000 R11: 00000000000003f8 R12: 00000000017a9453
Feb 21 07:38:52 c-node02 kernel: [242408.563216] R13: 00007f12a37fe200 R14: 00000000017a9453 R15: fffffffffffffe90
Feb 21 07:38:52 c-node02 kernel: [242408.604094] watchdog: BUG: soft lockup - CPU#45 stuck for 22s! [km_target_creat:49068]
Feb 21 07:38:52 c-node02 kernel: [242408.604095] Modules linked in: iscsi_scst(OE) crc32c_intel(O) scst_local(OE) netconsole(O) scst_user(OE) scst(OE) drbd(O) lru_cache(O) loop(O) 8021q(O) mrp(O) garp(O) nfsd(O) nfs_acl(O) auth_rpcgss(O) lockd(O) sunrpc(O) grace(O) xt_MASQUERADE(O) xt_nat(O) xt_state(O) iptable_nat(O) xt_addrtype(O) xt_conntrack(O) nf_nat(O) nf_conntrack(O) nf_defrag_ipv4(O) nf_defrag_ipv6(O) libcrc32c(O) br_netfilter(O) bridge(O) stp(O) llc(O) overlay(O) be2iscsi(O) iscsi_boot_sysfs(O) bnx2i(O) cnic(O) uio(O) cxgb4i(O) cxgb4(O) cxgb3i(O) libcxgbi(O) cxgb3(O) mdio(O) libcxgb(O) ib_iser(OE) iscsi_tcp(O) libiscsi_tcp(O) libiscsi(O) scsi_transport_iscsi(O) dm_multipath(O) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) ib_uverbs(OE) mlx5_core(OE) mdev(OE) mlxfw(OE) ptp(O) pps_core(O) mlx4_ib(OE) ib_core(OE) mlx4_core(OE) mlx_compat(OE) fuse(O) binfmt_misc(O) pvpanic(O) pcspkr(O) virtio_rng(O) virtio_net(O) net_failover(O) failover(O) i2
:

Second typical case PANIC: 

>From the cosule:

[123080.813877] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[    0.000000] Linux version 5.4.80-KM8 (david.mozes@kbuilder64-tc8-test1) (gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)) #14 SMP Mon Jan 11 16:21:21 IST 2021

Mon Jan 11 16:21:21 IST 2021
[    0.000000] Command line: ro root=LABEL=/ rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 nompath append="nmi_watchdog=2"

>From the vmcore-dmesg: 

[121271.606463] ll header: 00000000: 42 01 0a ad 0c 02 42 01 0a ad 0c 01 08 00
[122656.730235] sh (27931): drop_caches: 3
[123080.813877] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[123080.813887] sched: RT throttling activated
[123080.821706] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: serial8250_console_write+0x26e/0x270

After I comment out 

After I comment out the cond_resched(), everything looks more stable. 
I Will try another run as Eric sagest with the: 
cond_resched() 
before the:
Invalidtated_mapping_pages 
See and report regarding the behavior.
I think we have a very stressful environment on GCP for testing that
Thx
David

-----Original Message-----
From: Eric Sandeen <sandeen@xxxxxxxxxxx> 
Sent: Wednesday, March 17, 2021 3:28 AM
To: David Mozes <david.mozes@xxxxxxx>; linux-fsdevel@xxxxxxxxxxxxxxx
Cc: sandeen@xxxxxxxxxx
Subject: Re: fs: avoid softlockups in s_inodes iterators commit

On 3/16/21 3:56 PM, David Mozes wrote:
> Hi,
> Per Eric's request, I forward this discussion to the list first.
> My first answers are inside

ok, but you stripped out all of the other useful information like backtraces, stack corruption, etc. You need to provide the evidence of the actual failure for the list to see. Also ..

> -----Original Message-----
> From: Eric Sandeen <sandeen@xxxxxxxxxx>
> Sent: Tuesday, March 16, 2021 10:18 PM
> To: David Mozes <david.mozes@xxxxxxx>
> Subject: Re: Mail from David.Mozes regarding fs: avoid softlockups in 
> s_inodes iterators commit
> 
> On 3/16/21 3:02 PM, David Mozes wrote:
>> Hi Eric,
>>

...

> David > Not sure yet,  Will check.
>> 5.4.8 vanilla kernel it custom
> 
> Is it vanilla, or is it custom? 5.4.8 or 5.4.80?
> 
> David> 5.4.80 small custom as I mantion. 

what is a "small custom?" Can you reproduce it on an unmodified upstream kernel?

-Eric