Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Bob,

thanks a lot for your input.

> >>>> That was our first idea too, but we haven't found any indication that this is the case. The xfs file systems seem perfectly fine when all nfsds are in D state, and we can
> >>>> read from them and write to them. If xfs were to block nfs IO, this should
> >>>> affect other processes too, right?
> >>> It's possible that the NFSD threads are waiting on I/O to a particular filesystem block. XFS is not likely to block other activity in this case.
> >> ok good to know. So far we were under the impression that a file system would
> >> block as a whole.
> > 
> > XFS tries to operate in parallel as much as it can. Maybe other filesystems aren't as capable.
> > 
> > If the unresponsive block is part of a superblock or the journal (ie, shared metadata) I would expect XFS to become unresponsive. For I/O on blocks containing file data, it is likely to have more robust behavior.
> > 
> 
> Pretty sure we have seen a similar issue - never fully explained.  From what I recall, the server gets to a low memory state. At that point, efforts to coalesce writes are abandoned, and each write request is processed in line - vs scheduled - all nfsd's then pile up in D.  writes continue to arrive at a rate higher than can keep up. But, the back end store (a high end netapp raid 6 w/240 drives also with xfs) had very little load - not too busy.  Never fully explained it - but Chucks point on  shared metadata block may be good place to look - and whether in-line write at low memory could have synergy.  IIRC, worked around with releases and tunables like minfree kmem et.al. , that came into play to reduce - but not eliminate. I'm away from reference material for a while but I'll review and update if I find anything.
we'll certainly investigate this topic, but right now it's kinda hard to
imagine since I've never seen the file server above ~10G of its 64G of RAM
(excluding page cache of course). We're not even sure heavy writes trigger the
problem, in one case our monitoring hinted at a lot of reads leading up to the
freeze.
OTOH if our issue could be resolved by throwing a bunch of RAM bars into the
server, all the better.


thanks,
-Christian




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux