On Wed, Jul 30, 2014 at 11:48:47AM +1000, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > Crash testing of CRC enabled filesystems has resulted in a number of > reports of bad CRCs being detected after the filesystem was mounted. > Errors such as the following were being seen: > > XFS (sdb3): Mounting V5 Filesystem > XFS (sdb3): Starting recovery (logdev: internal) > XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1 > XFS (sdb3): Unmount and run xfs_repair > XFS (sdb3): First 64 bytes of corrupted metadata buffer: > ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40 XAGF...........@ > ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01 ..mS..w......... > ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03 ................ > ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00 ................ > XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1 > > The errors were typically being seen in AGF, AGI and their related > btree block buffers some time after log recovery had run. Often it > wasn't until later subsequent mounts that the problem was > discovered. The common symptom was a buffer with the correct > contents, but a CRC and an LSN that matched an older version of the > contents. > > Some debug added to _xfs_buf_ioapply() indicated that buffers were > being written without verifiers attached to them from log recovery, > and Jan Kara isolated the cause to log recovery readahead an dit's > interactions with buffers that had a more recent LSN on disk than > the transaction being recovered. In this case, the buffer did not > get a verifier attached, and os when the second phase of log > recovery ran and recovered EFIs and unlinked inodes, the buffers > were modified and written without the verifier running. Hence they > had up to date contents, but stale LSNs and CRCs. > > Fix it by attaching verifiers to buffers we skip due to future LSN > values so they don't escape into the buffer cache without the > correct verifier attached. > > This patch is based on analysis and a patch from Jan Kara. > > cc: <stable@xxxxxxxxxxxxxxx> > Reported-by: Jan Kara <jack@xxxxxxx> > Reported-by: Fanael Linithien <fanael4@xxxxxxxxx> > Reported-by: Grozdan <neutrino8@xxxxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/xfs_log_recover.c | 47 +++++++++++++++++++++++++++++++---------------- > 1 file changed, 31 insertions(+), 16 deletions(-) > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index fbc2362..0a015cc 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -2126,6 +2126,17 @@ xlog_recover_validate_buf_type( > __uint16_t magic16; > __uint16_t magicda; > > + /* > + * We can only do post recovery validation on items on CRC enabled > + * fielsystems as we need to know when the buffer was written to be able > + * to determine if we should have replayed the item. If we replay old > + * metadata over a newer buffer, then it will enter a temporarily > + * inconsistent state resulting in verification failures. Hence for now > + * just avoid the verification stage for non-crc filesystems > + */ > + if (!xfs_sb_version_hascrc(&mp->m_sb)) > + return; > + This function has a couple other similar *_hascrc() checks further down that we should probably kill now. Otherwise, looks Ok... Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > magic32 = be32_to_cpu(*(__be32 *)bp->b_addr); > magic16 = be16_to_cpu(*(__be16*)bp->b_addr); > magicda = be16_to_cpu(info->magic); > @@ -2197,10 +2208,6 @@ xlog_recover_validate_buf_type( > #endif > break; > case XFS_BLFT_DINO_BUF: > - /* > - * we get here with inode allocation buffers, not buffers that > - * track unlinked list changes. > - */ > if (magic16 != XFS_DINODE_MAGIC) { > xfs_warn(mp, "Bad INODE block magic!"); > ASSERT(0); > @@ -2388,16 +2395,7 @@ xlog_recover_do_reg_buffer( > /* Shouldn't be any more regions */ > ASSERT(i == item->ri_total); > > - /* > - * We can only do post recovery validation on items on CRC enabled > - * fielsystems as we need to know when the buffer was written to be able > - * to determine if we should have replayed the item. If we replay old > - * metadata over a newer buffer, then it will enter a temporarily > - * inconsistent state resulting in verification failures. Hence for now > - * just avoid the verification stage for non-crc filesystems > - */ > - if (xfs_sb_version_hascrc(&mp->m_sb)) > - xlog_recover_validate_buf_type(mp, bp, buf_f); > + xlog_recover_validate_buf_type(mp, bp, buf_f); > } > > /* > @@ -2505,12 +2503,29 @@ xlog_recover_buffer_pass2( > } > > /* > - * recover the buffer only if we get an LSN from it and it's less than > + * Recover the buffer only if we get an LSN from it and it's less than > * the lsn of the transaction we are replaying. > + * > + * Note that we have to be extremely careful of readahead here. > + * Readahead does not attach verfiers to the buffers so if we don't > + * actually do any replay after readahead because of the LSN we found > + * in the buffer if more recent than that current transaction then we > + * need to attach the verifier directly. Failure to do so can lead to > + * future recovery actions (e.g. EFI and unlinked list recovery) can > + * operate on the buffers and they won't get the verifier attached. This > + * can lead to blocks on disk having the correct content but a stale > + * CRC. > + * > + * It is safe to assume these clean buffers are currently up to date. > + * If the buffer is dirtied by a later transaction being replayed, then > + * the verifier will be reset to match whatever recover turns that > + * buffer into. > */ > lsn = xlog_recover_get_buf_lsn(mp, bp); > - if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) > + if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) { > + xlog_recover_validate_buf_type(mp, bp, buf_f); > goto out_release; > + } > > if (buf_f->blf_flags & XFS_BLF_INODE_BUF) { > error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f); > -- > 2.0.0 > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs