On Tue, Apr 19, 2016 at 05:56:19PM +0800, Hugo Kuo wrote: > Hi XFS team, > > We encountered a problem frequently in past three weeks. Our daemons store > data to XFS partition associate with xattr. > > Disk seems not responding since all processes to this disk in D state and > can't be killed at all. > > - It happens on several disks. I feel it's randomly. > - Reboot seems solve the problem temporarily. > - All disks are multipath devices. > > > I suspected that's an issue from disk corrupted at beginning. But smartctl > doesn't show any clue about disk bad. And reboot makes the problem gone > away. > > > - Any process to this disk is blocked. Even a simple $ls . Kernel log > <https://gist.github.com/HugoKuo/f87748786b26ea04fd9e1d86d9538293> Looks like it's waiting on an AGF buffer. The buffer could be held by something else, but we don't have enough information from that one trace. Could you get all of the blocked tasks when in this state (e.g., "echo w > /proc/sysrq-trigger")? > - I tested the disk by read bytes on block via $dd . It works fine > without any error in dmesg. > - The `xfs_repair -n` output of a problematic mount point [xfs_repair -n] > <https://gist.github.com/HugoKuo/76f65bdc0b860ca6ed5e786f8c43da0e> . It > is still processing. I presume this was run after a forced reboot..? If so, was the filesystem remounted first to replay the log (xfs_repair -n doesn't detect/warn about a dirty log, iirc). If the log was dirty, then repair is a bit less interesting simply because some corruption is to be expected in that scenario. > - Kernel : Linux node9 2.6.32-573.8.1.el6.x86_64 #1 SMP Tue Nov 10 > 18:01:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > - OS : CentOS release 6.5 (Final) > - XFS : xfsprogs.x86_64 3.1.1-14.el6 > > > There's an interesting behaviour of $ls command. > > * This is completed in 1sec. Very quick and give me the result in the > test.d864 file $ls /srv/node/d864/tmp > test.d864 > * This is hanging $ls /srv/node/d864/tmp > I'm not following you here. Are you missing an attachment (test.d864)? Brian > [image: Inline image 1] > > I suspect there's something wrong with imap. Is there a known bug ? > > Thanks // Hugo > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs