FWIW that's *exactly* what we see. Eventually, if the server is left alone for enough time, even the login system stops responding -- it's as if the I/O subsystem degrades and eventually blocks entirely. ----- Original Message ----- > From: "hedrick" <hedrick@xxxxxxxxxxx> > To: "Chuck Lever" <chuck.lever@xxxxxxxxxx> > Cc: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "linux-nfs" > <linux-nfs@xxxxxxxxxxxxxxx> > Sent: Monday, August 9, 2021 1:29:30 PM > Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load > Evidence is ambiguous. It seems that NFS activity hangs. The first time this > occurred I saw a process at 100% running rpciod. I tried to do a “sync” and > reboot, but the sync hung. > > The last time I couldn’t get data, but the kernel was running and responding to > ping. An ssh session responded to CR but when I tried to sudo it hung. Attempt > to login hung. Oddly, even though the ssh session responded to CR, syslog > entries on the local system stopped until the reboot. However we also send > syslog entries to a central server. Those continued and showed a continuing set > of mounts and unmounts happening through the reboot. > > I was goiog to get a stack trace of the 100% process if that happened again, but > last time I wasn’t in a situation to do that. I don’t think users will put up > with further attempts to debug, so for the moment I’m going to try disabling > delegations. > >> On Aug 9, 2021, at 1:37 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >> >> Then when you say "server hangs" you mean that the entire NFS server > > system deadlocks. It's not just unresponsive on one or more exports.