Re: [PATCH] iomap: Address soft lockup in iomap_finish_ioend()

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 6 Jan 2022 18:38:28 +0000

On Thu, 2022-01-06 at 13:36 -0500, Trond Myklebust wrote:
> On Thu, 2022-01-06 at 09:48 +1100, Dave Chinner wrote:
> > On Wed, Jan 05, 2022 at 08:45:05PM +0000, Trond Myklebust wrote:
> > > On Tue, 2022-01-04 at 21:09 -0500, Trond Myklebust wrote:
> > > > On Tue, 2022-01-04 at 12:22 +1100, Dave Chinner wrote:
> > > > > On Tue, Jan 04, 2022 at 12:04:23AM +0000, Trond Myklebust
> > > > > wrote:
> > > > > > We have different reproducers. The common feature appears
> > > > > > to
> > > > > > be
> > > > > > the
> > > > > > need for a decently fast box with fairly large memory
> > > > > > (128GB
> > > > > > in
> > > > > > one
> > > > > > case, 400GB in the other). It has been reproduced with HDs,
> > > > > > SSDs
> > > > > > and
> > > > > > NVME systems.
> > > > > > 
> > > > > > On the 128GB box, we had it set up with 10+ disks in a JBOD
> > > > > > configuration and were running the AJA system tests.
> > > > > > 
> > > > > > On the 400GB box, we were just serially creating large (>
> > > > > > 6GB)
> > > > > > files
> > > > > > using fio and that was occasionally triggering the issue.
> > > > > > However
> > > > > > doing
> > > > > > an strace of that workload to disk reproduced the problem
> > > > > > faster
> > > > > > :-
> > > > > > ).
> > > > > 
> > > > > Ok, that matches up with the "lots of logically sequential
> > > > > dirty
> > > > > data on a single inode in cache" vector that is required to
> > > > > create
> > > > > really long bio chains on individual ioends.
> > > > > 
> > > > > Can you try the patch below and see if addresses the issue?
> > > > > 
> > > > 
> > > > That patch does seem to fix the soft lockups.
> > > > 
> > > 
> > > Oops... Strike that, apparently our tests just hit the following
> > > when
> > > running on AWS with that patch.
> > 
> > OK, so there are also large contiguous physical extents being
> > allocated in some cases here.
> > 
> > > So it was harder to hit, but we still did eventually.
> > 
> > Yup, that's what I wanted to know - it indicates that both the
> > filesystem completion processing and the iomap page processing play
> > a role in the CPU usage. More complex patch for you to try below...
> > 
> > Cheers,
> > 
> > Dave.
> 
> Hi Dave,
> 
> This patch got further than the previous one. However it too failed
> on
> the same AWS setup after we started creating larger (in this case
> 52GB)
> files. The previous patch failed at 15GB.
> 
> NR_06-18:00:17 pm-46088DSX1 /mnt/data-portal/data $ ls -lh
> total 59G
> -rw-r----- 1 root root  52G Jan  6 18:20 100g
> -rw-r----- 1 root root 9.8G Jan  6 17:38 10g
> -rw-r----- 1 root root   29 Jan  6 17:36 file
> NR_06-18:20:10 pm-46088DSX1 /mnt/data-portal/data $
> Message from syslogd@pm-46088DSX1 at Jan  6 18:22:44 ...
>  kernel:[ 5548.082987] watchdog: BUG: soft lockup - CPU#10 stuck for
> 24s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:23:44 ...
>  kernel:[ 5608.082895] watchdog: BUG: soft lockup - CPU#10 stuck for
> 23s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:27:08 ...
>  kernel:[ 5812.082587] watchdog: BUG: soft lockup - CPU#10 stuck for
> 22s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:27:36 ...
>  kernel:[ 5840.082533] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:28:08 ...
>  kernel:[ 5872.082455] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:28:40 ...
>  kernel:[ 5904.082400] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:29:16 ...
>  kernel:[ 5940.082243] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:29:44 ...
>  kernel:[ 5968.082249] watchdog: BUG: soft lockup - CPU#10 stuck for
> 22s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:30:24 ...
>  kernel:[ 6008.082204] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:31:08 ...
>  kernel:[ 6052.082194] watchdog: BUG: soft lockup - CPU#10 stuck for
> 24s! [kworker/10:0:18995]
> Message from syslogd@pm-46088DSX1 at Jan  6 18:31:48 ...
>  kernel:[ 6092.082010] watchdog: BUG: soft lockup - CPU#10 stuck for
> 21s! [kworker/10:0:18995]
> 

Just to confirm that these are indeed the same XFS hangs:

[Thu Jan  6 18:33:58 2022] watchdog: BUG: soft lockup - CPU#10 stuck
for 24s! [kworker/10:0:18995]
[Thu Jan  6 18:33:58 2022] Modules linked in: nfsv3 auth_name
bpf_preload xt_nat veth nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4
dns_resolver nfsidmap nfs fscache netfs dm_multipath nfsd auth_rpcgss
nfs_acl lockd grace sunrpc xt_MASQUERADE nf_conntrack_netlink
xt_addrtype br_netfilter bridge stp llc overlay xt_sctp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_conntrack_ftp xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle
ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle
iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
ip_set nfnetlink ip6table_filter ip6_tables iptable_filter bonding tls
ipmi_msghandler intel_rapl_msr intel_rapl_common isst_if_common nfit
libnvdimm i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
rapl ip_tables xfs nvme ena nvme_core crc32c_intel
[Thu Jan  6 18:33:58 2022] CPU: 10 PID: 18995 Comm: kworker/10:0 Kdump:
loaded Tainted: G             L    5.15.12-200.pd.17721.el7.x86_64 #1
[Thu Jan  6 18:33:58 2022] Hardware name: Amazon EC2 r5b.4xlarge/, BIOS
1.0 10/16/2017
[Thu Jan  6 18:33:58 2022] Workqueue: xfs-conv/nvme1n1 xfs_end_io [xfs]
[Thu Jan  6 18:33:58 2022] RIP:
0010:_raw_spin_unlock_irqrestore+0x1c/0x20
[Thu Jan  6 18:33:58 2022] Code: 92 cc cc cc cc cc cc cc cc cc cc cc cc
cc 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 75 01 c3 fb 66
0f 1f 44 00 00 <c3> 0f 1f 00 0f 1f 44 00 00 8b 07 a9 ff 01 00 00 75 21
b8 00 02 00
[Thu Jan  6 18:33:58 2022] RSP: 0018:ffffac380beffd08 EFLAGS: 00000206
[Thu Jan  6 18:33:58 2022] RAX: 0000000000000001 RBX: 00000000000015c0
RCX: ffffffffffffb9a2
[Thu Jan  6 18:33:58 2022] RDX: ffffffff85809148 RSI: 0000000000000206
RDI: ffffffff85809140
[Thu Jan  6 18:33:58 2022] RBP: 0000000000000206 R08: ffffac380888fc80
R09: ffffac380888fc80
[Thu Jan  6 18:33:58 2022] R10: 00000000000000a0 R11: 0000000000000000
R12: ffffffff85809140
[Thu Jan  6 18:33:58 2022] R13: ffffe94e2ef6d780 R14: ffff95fad1053438
R15: ffffe94e2ef6d780
[Thu Jan  6 18:33:58 2022] FS:  0000000000000000(0000)
GS:ffff9612a3c80000(0000) knlGS:0000000000000000
[Thu Jan  6 18:33:58 2022] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Thu Jan  6 18:33:58 2022] CR2: 00007f9294723080 CR3: 0000001692810004
CR4: 00000000007706e0
[Thu Jan  6 18:33:58 2022] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Thu Jan  6 18:33:58 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Thu Jan  6 18:33:58 2022] PKRU: 55555554
[Thu Jan  6 18:33:58 2022] Call Trace:
[Thu Jan  6 18:33:58 2022]  <TASK>
[Thu Jan  6 18:33:58 2022]  wake_up_page_bit+0x79/0xe0
[Thu Jan  6 18:33:58 2022]  end_page_writeback+0xc4/0xf0
[Thu Jan  6 18:33:58 2022]  iomap_finish_ioend+0x130/0x260
[Thu Jan  6 18:33:58 2022]  ? xfs_iunlock+0xa4/0xf0 [xfs]
[Thu Jan  6 18:33:58 2022]  iomap_finish_ioends+0x77/0xa0
[Thu Jan  6 18:33:58 2022]  xfs_end_ioend+0x5a/0x120 [xfs]
[Thu Jan  6 18:33:58 2022]  xfs_end_io+0xa1/0xc0 [xfs]
[Thu Jan  6 18:33:58 2022]  process_one_work+0x1f1/0x390
[Thu Jan  6 18:33:58 2022]  worker_thread+0x53/0x3e0
[Thu Jan  6 18:33:58 2022]  ? process_one_work+0x390/0x390
[Thu Jan  6 18:33:58 2022]  kthread+0x127/0x150
[Thu Jan  6 18:33:58 2022]  ? set_kthread_struct+0x40/0x40
[Thu Jan  6 18:33:58 2022]  ret_from_fork+0x22/0x30
[Thu Jan  6 18:33:58 2022]  </TASK>

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx