process hangs on kernel 4.19 and 5.2, not on 4.9

Jan-Pieter Cornet <johnpc@xxxxxxxxxx> · Fri, 1 Nov 2019 12:00:02 +0100

Hi,

We are running dovecot, with NFS mounted mail spools. After a recent round of upgrades, we noticed that occasionally a process hangs in state "D", while accessing a file over NFS. This is on the default debian buster kernel: 4.19.67-2+deb10u1. When we downgraded to the older default stretch kernel, 4.9.189-3+deb9u1 (without changing anything else on the system), the hangs do not occur. I've also reproduced the crash on the buster backports kernel, 5.2.17-1~bpo10+1.

This is with dovecot 2.3.8 (latest), storing mails using mdbox. The NFS server is a Netapp running OnTap 9.6P2 clustermode.

With "hang" I mean that the process is unresponsive to anything but a kill -KILL signal. strace shows nothing (and after attaching, strace itself cannot be killed except with a kill -KILL). The netapp shows that the client is holding 2 files locked with an fcntl byte-range lock (preventing any other process from writing to the mailbox). There is only one process that gets stuck (commonly lmtp, writing to the mailbox, but I've also seen IMAP APPEND causing the problem on the production platform). After killing the stuck process, the mdbox index files are damaged and need rebuilding using "doveadm force-resync" (but that's just because the writing process was rudely interrupted).

/proc/$PID/stack of the hanging process contains, on the 4.19 kernel, with every crash that I've reproduced:
[<0>] nfs_iocounter_wait+0x74/0xa0 [nfs]
[<0>] do_unlk+0x8c/0xe0 [nfs]
[<0>] __x64_sys_flock+0xa4/0xf0
[<0>] do_syscall_64+0x53/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

On the 5.2 kernel, /proc/$PID/stack looks like this:
[<0>] do_unlk+0x8e/0xe0 [nfs]
[<0>] __x64_sys_flock+0xa7/0x100
[<0>] do_syscall_64+0x53/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

I can reproduce the crash on a test system, but it takes anywhere from a few minutes upto 30 minutes to hang. To trigger the bug, I start two "while true; do swaks -s localhost --proto LMTP ...; done" endless loops that hammer a single destination mailbox with mail. Another thread goes in every minute and deletes the mails again. Running a single "swaks" hammering loop does not trigger the hang. And as said, downgrading to 4.9 does not cause a hang either (so we're currently running the production cluster on the older kernel).

I'd appreciate any help in getting this resolved. I can also provide a detailed description of the test setup so someone can hopefully reproduce this, or I can try to dig deeper (I basically saved everything readable from /proc/$PID for the reproduced crashes, don't know if anything else is interesting in there). I could even try to make a tcpdump of the traffic to the NFS server, if you think that helps, although that will likely produce a pretty massive capture file. Or I can try with other kernel versions. So far I only tried using the available (pre-packaged) debian kernels.

Thanks for any input,

--
Jan-Pieter Cornet <johnpc@xxxxxxxxxx>
Systeembeheer XS4ALL Internet bv
www.xs4all.nl

Attachment:
signature.asc

Description: OpenPGP digital signature