On Thu, Jan 07, 2016 at 02:08:30PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When we do dquot readahead in log recovery, we do not use a verifier > as the underlying buffer may not have dquots in it. e.g. the > allocation operation hasn't yet been replayed. Hence we do not want > to fail recovery because we detect an operation to be replayed has > not been run yet. This problem was addressed for inodes in commit > d891400 ("xfs: inode buffers may not be valid during recovery > readahead") but the problem was not recognised to exist for dquots > and their buffers as the dquot readahead did not have a verifier. > > The result of not using a verifier is that when the buffer is then > next read to replay a dquot modification, the dquot buffer verifier > will only be attached to the buffer if *readahead is not complete*. > Hence we can read the buffer, replay the dquot changes and then add > it to the delwri submission list without it having a verifier > attached to it. This then generates warnings in xfs_buf_ioapply(), > which catches and warns about this case. > > Fix this and make it handle the same readahead verifier error cases > as for inode buffers by adding a new readahead verifier that has a > write operation as well as a read operation that marks the buffer as > not done if any corruption is detected. Also make sure we don't run > readahead if the dquot buffer has been marked as cancelled by > recovery. > > This will result in readahead either succeeding and the buffer > having a valid write verifier, or readahead failing and the buffer > state requiring the subsequent read to resubmit the IO with the new > verifier. In either case, this will result in the buffer always > ending up with a valid write verifier on it. > > Note: we also need to fix the inode buffer readahead error handling > to mark the buffer with EIO. Brian noticed the code I copied from > there wrong during review, so fix it at the same time. Add comments > linking the two functions that handle readahead verifier errors > together so we don't forget this behavioural link in future. > > cc: <stable@xxxxxxxxxxxxxxx> # 3.12 - current > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > > Version 2 > - fix logic error in determining if verify failed > - set error on buffer when verifier fails > - fix inode buffer readahead verifier to set error when it fails > - better comments, link dquot and inode buffer ra verifiers in the > comments > > fs/xfs/libxfs/xfs_dquot_buf.c | 36 ++++++++++++++++++++++++++++++------ > fs/xfs/libxfs/xfs_inode_buf.c | 14 +++++++++----- > fs/xfs/libxfs/xfs_quota_defs.h | 2 +- > fs/xfs/libxfs/xfs_shared.h | 1 + > fs/xfs/xfs_log_recover.c | 9 +++++++-- > 5 files changed, 48 insertions(+), 14 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_dquot_buf.c b/fs/xfs/libxfs/xfs_dquot_buf.c > index 11cefb2..3cc3cf7 100644 > --- a/fs/xfs/libxfs/xfs_dquot_buf.c > +++ b/fs/xfs/libxfs/xfs_dquot_buf.c ... > @@ -264,6 +264,25 @@ xfs_dquot_buf_read_verify( > } > > /* > + * readahead errors are silent and simply leave the buffer as !done so a real > + * read will then be run with the xfs_dquot_buf_ops verifier. See > + * xfs_inode_buf_verify() for why we use EIO and ~XBF_DONE here rather than > + * reporting the failure. > + */ > +static void > +xfs_dquot_buf_readahead_verify( > + struct xfs_buf *bp) > +{ > + struct xfs_mount *mp = bp->b_target->bt_mount; > + > + if (!xfs_dquot_buf_verify_crc(mp, bp) || > + !xfs_dquot_buf_verify(mp, bp, 0)) { > + xfs_buf_ioerror(bp, -EIO); > + bp->b_flags &= ~XBF_DONE; Do we really need to clear the flag if the I/O infrastructure doesn't set it until after the verifier is invoked? It's harmless, so if the intent is to just be cautious or future-proof: Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> ... but it does look slightly confusing, without a mention in the comments at least. Brian > + } > +} > + > +/* > * we don't calculate the CRC here as that is done when the dquot is flushed to > * the buffer after the update is done. This ensures that the dquot in the > * buffer always has an up-to-date CRC value. > @@ -274,7 +293,7 @@ xfs_dquot_buf_write_verify( > { > struct xfs_mount *mp = bp->b_target->bt_mount; > > - if (!xfs_dquot_buf_verify(mp, bp)) { > + if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN)) { > xfs_buf_ioerror(bp, -EFSCORRUPTED); > xfs_verifier_error(bp); > return; > @@ -287,3 +306,8 @@ const struct xfs_buf_ops xfs_dquot_buf_ops = { > .verify_write = xfs_dquot_buf_write_verify, > }; > > +const struct xfs_buf_ops xfs_dquot_buf_ra_ops = { > + .name = "xfs_dquot_ra", > + .verify_read = xfs_dquot_buf_readahead_verify, > + .verify_write = xfs_dquot_buf_write_verify, > +}; > diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c > index 1b8d98a..4816209 100644 > --- a/fs/xfs/libxfs/xfs_inode_buf.c > +++ b/fs/xfs/libxfs/xfs_inode_buf.c > @@ -62,11 +62,14 @@ xfs_inobp_check( > * has not had the inode cores stamped into it. Hence for readahead, the buffer > * may be potentially invalid. > * > - * If the readahead buffer is invalid, we don't want to mark it with an error, > - * but we do want to clear the DONE status of the buffer so that a followup read > - * will re-read it from disk. This will ensure that we don't get an unnecessary > - * warnings during log recovery and we don't get unnecssary panics on debug > - * kernels. > + * If the readahead buffer is invalid, we need to mark it with an error and > + * clear the DONE status of the buffer so that a followup read will re-read it > + * from disk. We don't report the error otherwise to avoid warnings during log > + * recovery and we don't get unnecssary panics on debug kernels. We use EIO here > + * because all we want to do is say readahead failed; there is no-one to report > + * the error to, so this will distinguish it from a non-ra verifier failure. > + * Changes to this readahead error behavour also need to be reflected in > + * xfs_dquot_buf_readahead_verify(). > */ > static void > xfs_inode_buf_verify( > @@ -92,6 +95,7 @@ xfs_inode_buf_verify( > XFS_ERRTAG_ITOBP_INOTOBP, > XFS_RANDOM_ITOBP_INOTOBP))) { > if (readahead) { > + xfs_buf_ioerror(bp, -EIO); > bp->b_flags &= ~XBF_DONE; > return; > } > diff --git a/fs/xfs/libxfs/xfs_quota_defs.h b/fs/xfs/libxfs/xfs_quota_defs.h > index 1b0a083..f51078f 100644 > --- a/fs/xfs/libxfs/xfs_quota_defs.h > +++ b/fs/xfs/libxfs/xfs_quota_defs.h > @@ -153,7 +153,7 @@ typedef __uint16_t xfs_qwarncnt_t; > #define XFS_QMOPT_RESBLK_MASK (XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS) > > extern int xfs_dqcheck(struct xfs_mount *mp, xfs_disk_dquot_t *ddq, > - xfs_dqid_t id, uint type, uint flags, char *str); > + xfs_dqid_t id, uint type, uint flags, const char *str); > extern int xfs_calc_dquots_per_chunk(unsigned int nbblks); > > #endif /* __XFS_QUOTA_H__ */ > diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h > index 5be5297..15c3ceb 100644 > --- a/fs/xfs/libxfs/xfs_shared.h > +++ b/fs/xfs/libxfs/xfs_shared.h > @@ -49,6 +49,7 @@ extern const struct xfs_buf_ops xfs_inobt_buf_ops; > extern const struct xfs_buf_ops xfs_inode_buf_ops; > extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; > extern const struct xfs_buf_ops xfs_dquot_buf_ops; > +extern const struct xfs_buf_ops xfs_dquot_buf_ra_ops; > extern const struct xfs_buf_ops xfs_sb_buf_ops; > extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; > extern const struct xfs_buf_ops xfs_symlink_buf_ops; > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 26e67b4..da37beb 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -3521,6 +3521,7 @@ xlog_recover_dquot_ra_pass2( > struct xfs_disk_dquot *recddq; > struct xfs_dq_logformat *dq_f; > uint type; > + int len; > > > if (mp->m_qflags == 0) > @@ -3541,8 +3542,12 @@ xlog_recover_dquot_ra_pass2( > ASSERT(dq_f); > ASSERT(dq_f->qlf_len == 1); > > - xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, > - XFS_FSB_TO_BB(mp, dq_f->qlf_len), NULL); > + len = XFS_FSB_TO_BB(mp, dq_f->qlf_len); > + if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0)) > + return; > + > + xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len, > + &xfs_dquot_buf_ra_ops); > } > > STATIC void > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs