Le Mon, 15 Dec 2014 13:25:00 +0100 Emmanuel Florac <eflorac@xxxxxxxxxxxxxx> écrivait: > Reading the source I see that the error occured in xfs_buf_read_map, I > suppose it's when xfsbufd tries to scan dirty metadata? This is a read > error, so it could very well be a simple IO starvation at the > controller level (as the controller probably gives priority to > whatever writes are pending over reads). > > Maybe setting xfsbufd_centisecs to the max could help here? Trying > right away... Any advice welcome. > Alas, same thing; dmesg output: ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): Metadata corruption detected at xfs_inode_buf_verify+0x6c/0xb0, block 0xeffffff40 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 64 bytes of corrupted metadata buffer: ffff8800df1f5000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ ffff8800df1f5030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-0): metadata I/O error: block 0xeffffff40 ("xfs_trans_read_buf_map") error 117 numblks 16 XFS (dm-0): xfs_do_force_shutdown(0x1) called from line 383 of file fs/xfs/xfs_trans_buf.c. Return address = 0xffffffff8125cc90 XFS (dm-0): I/O Error Detected. Shutting down filesystem XFS (dm-0): Please umount the filesystem and rectify the problem(s) XFS (dm-0): xfs_imap_to_bp: xfs_trans_read_buf() returned error 117. XFS (dm-0): xfs_log_force: error 5 returned. XFS (dm-0): xfs_log_force: error 5 returned. There is no IO error at the RAID controller level, at all. The buffer hasn't been overwritten with zeros, I'm pretty sure it actually timed out and just read nothing. This is not a case for an IO error IMO, a retry would almost certainly succeed; after all the problem occurred after more than 8 hours of continuous heavy read/write activity. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@xxxxxxxxxxxxxx> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs