Hi, On Wed, Apr 19, 2017 at 12:58:05PM +0200, Michael Weissenbacher wrote: > Hi List! > I have a storage server which primarily does around 15-20 parallel > rsync's, nothing special. Sometimes (3-4 times a day) i notice that all > I/O on the file system suddenly comes to a halt and the only process > that continues to do any I/O (according to iotop) is the process > xfsaild/md127. When this happens, xfsaild only does reads (according to > iotop) and consistently in D State (according to top). > Unfortunately this can sometimes stay like this for 5-15 minutes. During > this time even a simple "ls" our "touch" would block and be stuck in D > state. All other running processes accessing the fs are of course also > stuck in D state. It is a XFS V5 filesystem. > Then again, as sudden as it began, everything goes back to normal and > I/O continues. The problem is accompanied with several "process blocked > for xxx seconds" in dmesg and also some dropped connections due to > network timeouts. > > I've tried several things to remedy the problem, including: > - changing I/O schedulers (tried noop, deadline and cfq). Deadline > seems to be best (the block goes away in less time compared with the > others). > - removing all mount options (defaults + usrquota, grpquota) > - upgrading to the latest 4.11.0-rc kernel (before that i was on 4.9.x) > > Nothing of the above seemed to have made a significant change to the > problem. > > xfs_info output of the fs in question: > meta-data=/dev/md127 isize=512 agcount=33, > agsize=268435440 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=1 spinodes=0 rmapbt=0 > = reflink=0 > data = bsize=4096 blocks=8789917696, imaxpct=10 > = sunit=16 swidth=96 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > This is really not enough to give any idea of what might be happening, although this looks more like a slow storage while xfsaild is flushing the log, but we really need more information to try to give a better idea of what is going on, please look at: http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Specially for: storage layout (RAID arrays, LVMs, thin provisioning, etc), and the dmesg output with the traces from the hang tasks. Cheers. > Storage Subsystem: Dell Perc H730P Controller 2GB NVCACHE, 12 6TB Disks, > RAID-10, latest Firmware Updates > > I would be happy to dig out more information if needed. How can i find > out if the RAID Controller itself gets stuck? Nothing bad shows up in > the hardware and SCSI controller logs. > -- Carlos -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html