> > What happens on your network every 14 days or so? Is there a rogue > client side backup or admin task running somewhere? > Well, we run nightly backups, but thats read ops. When I look at the load, its not particular more loaded at that time, than normal work. > > Does this repeat every 120s? No, what I sent is the full trace. It happened around 23:23, but no more XFS errors in the log (which is on the ext4 OS disk). It was working when I came in the following morning aroung 6:45, and worked for some time, but initially failed, and we had to reboot the server to get NFS exports to work. But, as I said, even though the fs was inaccessible from NFS I could `ls` the filesystem locally, but we really have no indication of it being an NFS problem, as we only see the XFS problem. It could also boil down to a NFS problem, I just wasn't sure how to read the XFS trace. > These hung task warnings can happen if your workload has overloaded > your raid array and everything doing IO hangs while it catches up. > e.g. you have 6GB of random 4k writes in the controller NV cache and > it takes minutes for it to flush (because random 4k writes are slow) > and make room for new incoming IO.... > > If the warnings don't repeat, then it means it was a temporary > overload. If the warnings repeat, but change processes and stack > traces then it's a sustained overload condition. If exactly the same > warnings repeat and/or has stalled and doesn't restart, then we've > got some kind of hang occurring and we'll need to look into it > further. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx