Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



IIRC most of the NFS server tuning options require a NFS service restart to take effect.

----- Original Message -----
> From: "hedrick" <hedrick@xxxxxxxxxxx>
> To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
> Cc: "Chuck Lever" <chuck.lever@xxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "linux-nfs"
> <linux-nfs@xxxxxxxxxxxxxxx>
> Sent: Monday, August 9, 2021 1:38:33 PM
> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load

> Does setting /proc/sys/fs/leases-enable to 0 work while the system is up? I was
> expecting to see lslocks | grep DELE | wc go down. It’s not. It’s staying
> around 1850.
> 
>> On Aug 9, 2021, at 2:30 PM, Timothy Pearson <tpearson@xxxxxxxxxxxxxxxxxxxxx>
>> wrote:
>> 
>> FWIW that's *exactly* what we see.  Eventually, if the server is left alone for
>> enough time, even the login system stops responding -- it's as if the I/O
>> subsystem degrades and eventually blocks entirely.
>> 
>> ----- Original Message -----
>>> From: "hedrick" <hedrick@xxxxxxxxxxx>
>>> To: "Chuck Lever" <chuck.lever@xxxxxxxxxx>
>>> Cc: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>, "J. Bruce Fields"
>>> <bfields@xxxxxxxxxxxx>, "linux-nfs"
>>> <linux-nfs@xxxxxxxxxxxxxxx>
>>> Sent: Monday, August 9, 2021 1:29:30 PM
>>> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
>> 
>>> Evidence is ambiguous. It seems that NFS activity hangs. The first time this
>>> occurred I saw a process at 100% running rpciod. I tried to do a “sync” and
>>> reboot, but the sync hung.
>>> 
>>> The last time I couldn’t get data, but the kernel was running and responding to
>>> ping. An ssh session responded to CR but when I tried to sudo it hung. Attempt
>>> to login hung. Oddly, even though the ssh session responded to CR, syslog
>>> entries on the local system stopped until the reboot. However we also send
>>> syslog entries to a central server. Those continued and showed a continuing set
>>> of mounts and unmounts happening through the reboot.
>>> 
>>> I was goiog to get a stack trace of the 100% process if that happened again, but
>>> last time I wasn’t in a situation to do that. I don’t think users will put up
>>> with further attempts to debug, so for the moment I’m going to try disabling
>>> delegations.
>>> 
>>>> On Aug 9, 2021, at 1:37 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> Then when you say "server hangs" you mean that the entire NFS server
> >>> system deadlocks. It's not just unresponsive on one or more exports.




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux