On Tue, Nov 15, 2022 at 02:49:54PM +1100, Dave Chinner wrote: > On Sun, Oct 02, 2022 at 11:20:29AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > In commit d658e, we tried to improve the robustnes of xchk_get_inode in > > the face of EINVAL returns from iget by calling xfs_imap to see if the > > inobt itself thinks that the inode is allocated. Unfortunately, that > > commit didn't consider the possibility that the inode gets allocated > > after iget but before imap. In this case, the imap call will succeed, > > but we turn that into a corruption error and tell userspace the inode is > > corrupt. > > > > Avoid this false corruption report by grabbing the AGI header and > > retrying the iget before calling imap. If the iget succeeds, we can > > proceed with the usual scrub-by-handle code. Fix all the incorrect > > comments too, since unreadable/corrupt inodes no longer result in EINVAL > > returns. > > > > Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode") > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > OK. > > > --- > > fs/xfs/scrub/common.c | 203 ++++++++++++++++++++++++++++++++++++++++--------- > > fs/xfs/scrub/common.h | 4 + > > fs/xfs/xfs_icache.c | 3 - > > fs/xfs/xfs_icache.h | 1 > > 4 files changed, 172 insertions(+), 39 deletions(-) > > > > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c > > index 42a25488bd25..9a811c5fa8f7 100644 > > --- a/fs/xfs/scrub/common.c > > +++ b/fs/xfs/scrub/common.c > > @@ -635,6 +635,14 @@ xchk_ag_init( > > > > /* Per-scrubber setup functions */ > > > > +void > > +xchk_trans_cancel( > > + struct xfs_scrub *sc) > > +{ > > + xfs_trans_cancel(sc->tp); > > + sc->tp = NULL; > > +} > > + > > /* > > * Grab an empty transaction so that we can re-grab locked buffers if > > * one of our btrees turns out to be cyclic. > > @@ -720,6 +728,80 @@ xchk_iget( > > return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp); > > } > > > > +/* > > + * Try to grab an inode. If we can't, return the AGI buffer so that the caller > > + * can single-step the loading process to see where things went wrong. > > "Try to grab an inode in a manner that avoids races with physical inode > allocation. If we can't, return the locked AGI buffer so that the > caller can single-step the loading process to see where things went > wrong." Fixed. > > + * > > + * If the iget succeeds, return 0, a NULL AGI, and the inode. > > + * If the iget fails, return the error, the AGI, and a NULL inode. This can > > "... the locked AGI, ...." > > > + * include -EINVAL and -ENOENT for invalid inode numbers or inodes that are no > > + * longer allocated; or any other corruption or runtime error. > > + * If the AGI read fails, return the error, a NULL AGI, and NULL inode. > > + * If a fatal signal is pending, return -EINTR, a NULL AGI, and a NULL inode. > > + */ > > +int > > +xchk_iget_agi( > > + struct xfs_scrub *sc, > > + xfs_ino_t inum, > > + struct xfs_buf **agi_bpp, > > + struct xfs_inode **ipp) > > +{ > > + struct xfs_mount *mp = sc->mp; > > + struct xfs_trans *tp = sc->tp; > > + struct xfs_perag *pag; > > + int error; > > + > > +again: > > + *agi_bpp = NULL; > > + *ipp = NULL; > > + error = 0; > > + > > + if (xchk_should_terminate(sc, &error)) > > + return error; > > + > > + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, inum)); > > + error = xfs_ialloc_read_agi(pag, tp, agi_bpp); > > + xfs_perag_put(pag); > > + if (error) > > + return error; > > + > > + error = xfs_iget(mp, tp, inum, > > + XFS_IGET_NOWAIT | XFS_IGET_UNTRUSTED, 0, ipp); > > OK, IGET_NOWAIT is new.... > > > +#define XFS_IGET_NOWAIT 0x10 /* return EAGAIN instead of blocking */ > > But it doesn't prevent blocking. XFS_IGET_UNTRUSTED means we do a > inobt record lookup (btree buffer locking and IO that can block) as > well as reading the inode cluster from disk if it's not already in > cache. Hence this isn't what I'd call a "non-blocking" or "no wait" > operation. > > AFAICT, what is needed here is avoiding the -retry mechanism- of the > lookup, not "non blocking/no wait" semantics. i.e. this seems > reasonable to get an -EAGAIN error instead of delaying and trying > again if we are using XFS_IGET_NORETRY semantics.... Aha, yes, thank you for the better name. :) /* Return -EAGAIN immediately if the inode is unavailable. */ #define XFS_IGET_NORETRY (1U << 5) > > + if (error == -EAGAIN) { > > + /* > > + * Cached inode awaiting inactivation. Drop the AGI buffer in > > + * case the inactivation worker is now waiting for it, and try > > + * the iget again. > > + */ > > That's not the only reason xfs_iget() could return -EAGAIN, > right? radix_tree_preload() failing can cause -EAGAIN to be returned > from xfs_iget_cache_miss(), as can an instantiation race inserting > the new inode into the radix tree. The cache hit path has a bunch > more cases, too. Perhaps: Yes, I suppose that's possible if the incore inode gets evicted and someone else is reinstantiating it at the same time we're looking for it... > /* > * The inode may be in core but temporarily > * unavailable and may require the AGI buffer before > * it can be returned. Drop the AGI buffer and retry > * the lookup. > */ ...so yes, this is a better explanation of what's going on. > > + xfs_trans_brelse(tp, *agi_bpp); > > + delay(1); > > + goto again; > > + } > > + if (error == 0) { > > + /* We got the inode, so we can release the AGI. */ > > + ASSERT(*ipp != NULL); > > + xfs_trans_brelse(tp, *agi_bpp); > > + *agi_bpp = NULL; > > + } > > + > > + return error; > > That's better written as: > > if (error) > return error; > > /* We got the inode, so we can release the AGI. */ > ASSERT(*ipp != NULL); > xfs_trans_brelse(tp, *agi_bpp); > *agi_bpp = NULL; > return 0; > > Other than that, the code makes sense. Fixed. Thanks for the review. --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx