Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Does setting /proc/sys/fs/leases-enable to 0 work while the system is up? I was expecting to see lslocks | grep DELE | wc go down. It’s not. It’s staying around 1850.

> On Aug 9, 2021, at 2:30 PM, Timothy Pearson <tpearson@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> FWIW that's *exactly* what we see.  Eventually, if the server is left alone for enough time, even the login system stops responding -- it's as if the I/O subsystem degrades and eventually blocks entirely.
> 
> ----- Original Message -----
>> From: "hedrick" <hedrick@xxxxxxxxxxx>
>> To: "Chuck Lever" <chuck.lever@xxxxxxxxxx>
>> Cc: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "linux-nfs"
>> <linux-nfs@xxxxxxxxxxxxxxx>
>> Sent: Monday, August 9, 2021 1:29:30 PM
>> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
> 
>> Evidence is ambiguous. It seems that NFS activity hangs. The first time this
>> occurred I saw a process at 100% running rpciod. I tried to do a “sync” and
>> reboot, but the sync hung.
>> 
>> The last time I couldn’t get data, but the kernel was running and responding to
>> ping. An ssh session responded to CR but when I tried to sudo it hung. Attempt
>> to login hung. Oddly, even though the ssh session responded to CR, syslog
>> entries on the local system stopped until the reboot. However we also send
>> syslog entries to a central server. Those continued and showed a continuing set
>> of mounts and unmounts happening through the reboot.
>> 
>> I was goiog to get a stack trace of the 100% process if that happened again, but
>> last time I wasn’t in a situation to do that. I don’t think users will put up
>> with further attempts to debug, so for the moment I’m going to try disabling
>> delegations.
>> 
>>> On Aug 9, 2021, at 1:37 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> Then when you say "server hangs" you mean that the entire NFS server
>>> system deadlocks. It's not just unresponsive on one or more exports.





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux