Does setting /proc/sys/fs/leases-enable to 0 work while the system is up? I was expecting to see lslocks | grep DELE | wc go down. It’s not. It’s staying around 1850. > On Aug 9, 2021, at 2:30 PM, Timothy Pearson <tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote: > > FWIW that's *exactly* what we see. Eventually, if the server is left alone for enough time, even the login system stops responding -- it's as if the I/O subsystem degrades and eventually blocks entirely. > > ----- Original Message ----- >> From: "hedrick" <hedrick@xxxxxxxxxxx> >> To: "Chuck Lever" <chuck.lever@xxxxxxxxxx> >> Cc: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "linux-nfs" >> <linux-nfs@xxxxxxxxxxxxxxx> >> Sent: Monday, August 9, 2021 1:29:30 PM >> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load > >> Evidence is ambiguous. It seems that NFS activity hangs. The first time this >> occurred I saw a process at 100% running rpciod. I tried to do a “sync” and >> reboot, but the sync hung. >> >> The last time I couldn’t get data, but the kernel was running and responding to >> ping. An ssh session responded to CR but when I tried to sudo it hung. Attempt >> to login hung. Oddly, even though the ssh session responded to CR, syslog >> entries on the local system stopped until the reboot. However we also send >> syslog entries to a central server. Those continued and showed a continuing set >> of mounts and unmounts happening through the reboot. >> >> I was goiog to get a stack trace of the 100% process if that happened again, but >> last time I wasn’t in a situation to do that. I don’t think users will put up >> with further attempts to debug, so for the moment I’m going to try disabling >> delegations. >> >>> On Aug 9, 2021, at 1:37 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>> >>> Then when you say "server hangs" you mean that the entire NFS server >>> system deadlocks. It's not just unresponsive on one or more exports.