Re: Strange XFS problem

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 13 Sep 2018 16:18:33 +1000

On Thu, Sep 13, 2018 at 07:21:59AM +0200, Troels Hansen wrote:
> 
> > 
> > What happens on your network every 14 days or so? Is there a rogue
> > client side backup or admin task running somewhere?
> > 
> 
> Well, we run nightly backups, but thats read ops.

Yup, but that can get stuck modifying atime, like the bacula process
in the hung process traces. :)

Hmmm - just a thought - it's hardware raid - it's not running a
background admin op like a media scrub every 14 days, is it?

> When I look at the load, its not particular more loaded at that time, than normal work.

OK.

> > Does this repeat every 120s?
> 
> No, what I sent is the full trace. It happened around 23:23, but
> no more XFS errors in the log (which is on the ext4 OS disk).

Ok, so those processes reported as hung have been woken and made
progress again. It seems like a temporary overload situation.

> It was working when I came in the following morning aroung 6:45,
> and worked for some time,  but initially failed, and we had to
> reboot the server to get NFS exports to work.  But, as I said,
> even though the fs was inaccessible from NFS I could `ls` the
> filesystem locally, but we really have no indication of it being
> an NFS problem, as we only see the XFS problem.

That could be the same problem, with all the kernel nfsds blocked
waiting for the filesystem so no new NFS requests could be
processed.  How many kernel nfsd threads do you run?  Local
operations can still be done (don't go through nfsds), and they
won't be slow if they hit the caches rather than have to retreive
data from disk.

> It could also boil down to a NFS problem, I just wasn't sure how
> to read the XFS trace.

Like you, I don't think this is an NFS problem - it smells more of
how huge hardware writeback caches in front of slow disks using
RAID5/6 behave.

i.e. Flushing 100MB of sequential write data from the cache takes a
fraction or a second, flushing 100MB of random 4k write data to
RAID5 luns can take minutes. While the hardware cache and flushing is
supposed to be completely invisible to the OS, we can see it's
impact via unexpectedly high device utilisations and long IO times
for otherwise normal IO loads.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx