Hi Dave, On Sat, Aug 31, 2013 at 04:14:20PM +1000, Dave Chinner wrote: > On Fri, Aug 30, 2013 at 01:15:20PM -0500, Ben Myers wrote: > > Dave, > > > > On Tue, Aug 27, 2013 at 11:39:37AM +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > CRC enabled filesystems fail log recovery with 100% reliability on > > > xfstests xfs/085 with the following failure: > > > > Unfortunately I have not been able to hit this one... not sure why. > > > > > XFS (vdb): Mounting Filesystem > > > XFS (vdb): Starting recovery (logdev: internal) > > > XFS (vdb): Corruption detected. Unmount and run xfs_repair > > > XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0) > > > XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95 > > > > > > The problem is that the inode buffer has not been recovered before > > > the readahead on the inode buffer is issued. The checkpoint being > > > recovered actually allocates the inode chunk we are doing readahead > > > from, so what comes from disk during readahead is essentially > > > random and the verifier barfs on it. > > > > > > This inode buffer readahead problem affects non-crc filesystems, > > > too, but xfstests does not trigger it at all on such > > > configurations.... > > > > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > > > > I've been mulling this one over for a bit, and I'm not quite sure this > > is correct: > > > > My feeling is that in light of commit 9222a9cf, if we do take part of a > > buffer back in time, the write verifier should fail. > > I don't see the connection between 9222a9cf ("xfs: don't shutdown > log recovery on validation errors") and this issue. 9222a9cf works > around are a longstanding architectural deficiency of log > recovery, while this is a completely new problem introduced by the > inode buffer readahead in log recovery. Commit 9222a9cf left buffer operations for inodes clear in the v2 inode case: @@ -1845,7 +1845,13 @@ xlog_recover_do_inode_buffer( xfs_agino_t *buffer_nextp; trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f); - bp->b_ops = &xfs_inode_buf_ops; + + /* + * Post recovery validation only works properly on CRC enabled + * filesystems. + */ + if (xfs_sb_version_hascrc(&mp->m_sb)) + bp->b_ops = &xfs_inode_buf_ops; xlog_recover_commit_trans xlog_recover_items_pass2 xlog_recover_buffer_pass2 xlog_recover_do_inode_buffer if (xfs_sb_version_hascrc(&mp->m_sb)) bp->b_ops = &xfs_inode_buf_ops; My concern is that with the readahead we have: xlog_recover_commit_trans . xlog_recover_ra_pass2 . xlog_recover_inode_ra_pass2 . xfs_buf_readahead . xfs_buf_readahead_map . xfs_buf_read_map . if (!XFS_BUF_ISDONE(bp)) . bp->b_ops = ops; xlog_recover_items_pass2 xlog_recover_buffer_pass2 xlog_recover_do_inode_buffer if (xfs_sb_version_hascrc(&mp->m_sb)) bp->b_ops = &xfs_inode_buf_ops; Looks like we can set b_ops in xfs_buf_read_map in the v2 inode case and it would remain set through recovery when we intend it to be clear. If we needed to b_ops to be clear in commit 9222a9cf, I think it should also be clear in the readahead case. Here's what I suggest: --- fs/xfs/xfs_log_recover.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: b/fs/xfs/xfs_log_recover.c =================================================================== --- a/fs/xfs/xfs_log_recover.c 2013-09-03 16:57:51.534388540 -0500 +++ b/fs/xfs/xfs_log_recover.c 2013-09-03 16:59:13.784398092 -0500 @@ -3309,7 +3309,9 @@ xlog_recover_inode_ra_pass2( return; xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno, - ilfp->ilf_len, &xfs_inode_buf_ra_ops); + ilfp->ilf_len, + xfs_sb_version_hascrc(&mp->m_sb) ? + &xfs_inode_buf_ra_ops : NULL); } STATIC void Thanks, Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs