IIRC most of the NFS server tuning options require a NFS service restart to take effect. ----- Original Message ----- > From: "hedrick" <hedrick@xxxxxxxxxxx> > To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> > Cc: "Chuck Lever" <chuck.lever@xxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "linux-nfs" > <linux-nfs@xxxxxxxxxxxxxxx> > Sent: Monday, August 9, 2021 1:38:33 PM > Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load > Does setting /proc/sys/fs/leases-enable to 0 work while the system is up? I was > expecting to see lslocks | grep DELE | wc go down. It’s not. It’s staying > around 1850. > >> On Aug 9, 2021, at 2:30 PM, Timothy Pearson <tpearson@xxxxxxxxxxxxxxxxxxxxx> >> wrote: >> >> FWIW that's *exactly* what we see. Eventually, if the server is left alone for >> enough time, even the login system stops responding -- it's as if the I/O >> subsystem degrades and eventually blocks entirely. >> >> ----- Original Message ----- >>> From: "hedrick" <hedrick@xxxxxxxxxxx> >>> To: "Chuck Lever" <chuck.lever@xxxxxxxxxx> >>> Cc: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "J. Bruce Fields" >>> <bfields@xxxxxxxxxxxx>, "linux-nfs" >>> <linux-nfs@xxxxxxxxxxxxxxx> >>> Sent: Monday, August 9, 2021 1:29:30 PM >>> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load >> >>> Evidence is ambiguous. It seems that NFS activity hangs. The first time this >>> occurred I saw a process at 100% running rpciod. I tried to do a “sync” and >>> reboot, but the sync hung. >>> >>> The last time I couldn’t get data, but the kernel was running and responding to >>> ping. An ssh session responded to CR but when I tried to sudo it hung. Attempt >>> to login hung. Oddly, even though the ssh session responded to CR, syslog >>> entries on the local system stopped until the reboot. However we also send >>> syslog entries to a central server. Those continued and showed a continuing set >>> of mounts and unmounts happening through the reboot. >>> >>> I was goiog to get a stack trace of the 100% process if that happened again, but >>> last time I wasn’t in a situation to do that. I don’t think users will put up >>> with further attempts to debug, so for the moment I’m going to try disabling >>> delegations. >>> >>>> On Aug 9, 2021, at 1:37 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> >>>> Then when you say "server hangs" you mean that the entire NFS server > >>> system deadlocks. It's not just unresponsive on one or more exports.