On Sun, Sep 03, 2023 at 09:15:59AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > The previous patch to reload unrecovered unlinked inodes when adding a > newly created inode to the unlinked list is missing a key piece of > functionality. It doesn't handle the case that someone calls xfs_iget > on an inode that is not the last item in the incore list. For example, > if at mount time the ondisk iunlink bucket looks like this: > > AGI -> 7 -> 22 -> 3 -> NULL > > None of these three inodes are cached in memory. Now let's say that > someone tries to open inode 3 by handle. We need to walk the list to > make sure that inodes 7 and 22 get loaded cold, and that the > i_prev_unlinked of inode 3 gets set to 22. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_export.c | 6 +++ > fs/xfs/xfs_inode.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_inode.h | 9 +++++ > fs/xfs/xfs_itable.c | 9 +++++ > fs/xfs/xfs_trace.h | 20 ++++++++++ > 5 files changed, 144 insertions(+) > > > diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c > index 1064c2342876..f71ea786a6d2 100644 > --- a/fs/xfs/xfs_export.c > +++ b/fs/xfs/xfs_export.c > @@ -146,6 +146,12 @@ xfs_nfs_get_inode( > return ERR_PTR(error); > } > > + error = xfs_inode_reload_unlinked(ip); > + if (error) { > + xfs_irele(ip); > + return ERR_PTR(error); > + } We don't want to be creating an empty transaction, locking the inode and then cancelling the transaction having done nothing on every NFS handle we have to resolve. We only want to do this if the link count is zero and i_prev_unlinked == 0, at which point we can then take the slow path (i.e call xfs_inode_reload_unlinked()) to reload the unlinked list. Something like: if (xfs_inode_unlinked_incomplete(ip)) { error = xfs_inode_reload_unlinked(ip); if (error) { xfs_irele(ip); return ERR_PTR(error); } } Hmmmm. if i_nlink is zero on ip and we call xfs_irele() on it, the it will be punted straight to inactivation, which will try to remove it from the unlinked list and free it. But we just failed to reload the unlinked list, so inactivation is going to go badly... So on reload error, shouldn't we be shutting down the filesystem because we know we cannot safely remove this inode from the unlinked list and free it? > + > if (VFS_I(ip)->i_generation != generation) { > xfs_irele(ip); > return ERR_PTR(-ESTALE); > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index 6cd2f29b540a..56f6bde6001b 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -3607,3 +3607,103 @@ xfs_iunlock2_io_mmap( > if (ip1 != ip2) > inode_unlock(VFS_I(ip1)); > } > + > +/* > + * Reload the incore inode list for this inode. Caller should ensure that > + * the link count cannot change, either by taking ILOCK_SHARED or otherwise > + * preventing other threads from executing. > + */ > +int > +xfs_inode_reload_unlinked_bucket( > + struct xfs_trans *tp, > + struct xfs_inode *ip) > +{ > + struct xfs_mount *mp = tp->t_mountp; > + struct xfs_buf *agibp; > + struct xfs_agi *agi; > + struct xfs_perag *pag; > + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino); > + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); > + xfs_agino_t prev_agino, next_agino; > + unsigned int bucket; > + bool foundit = false; > + int error; > + > + /* Grab the first inode in the list */ > + pag = xfs_perag_get(mp, agno); > + error = xfs_ialloc_read_agi(pag, tp, &agibp); > + xfs_perag_put(pag); > + if (error) > + return error; > + > + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; > + agi = agibp->b_addr; > + > + trace_xfs_inode_reload_unlinked_bucket(ip); > + > + xfs_info_ratelimited(mp, > + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.", > + agino, agno); > + > + prev_agino = NULLAGINO; > + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]); > + while (next_agino != NULLAGINO) { > + struct xfs_inode *next_ip = NULL; > + > + if (next_agino == agino) { > + /* Found this inode, set its backlink. */ > + next_ip = ip; > + next_ip->i_prev_unlinked = prev_agino; > + foundit = true; > + } > + if (!next_ip) { > + /* Inode already in memory. */ > + next_ip = xfs_iunlink_lookup(pag, next_agino); > + } > + if (!next_ip) { > + /* Inode not in memory, reload. */ > + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, > + next_agino); > + if (error) > + break; > + > + next_ip = xfs_iunlink_lookup(pag, next_agino); > + } > + if (!next_ip) { > + /* No incore inode at all? We reloaded it... */ > + ASSERT(next_ip != NULL); > + error = -EFSCORRUPTED; > + break; > + } Not a great fan of this logic construct, nor am I great fan of asking you to restructure code for vanity reasons. However, I do think a goto improves it quite a lot: while (next_agino != NULLAGINO) { struct xfs_inode *next_ip = NULL; if (next_agino == agino) { /* Found this inode, set its backlink. */ next_ip = ip; next_ip->i_prev_unlinked = prev_agino; foundit = true; goto next_inode; } /* Try in-memory lookup first. */ next_ip = xfs_iunlink_lookup(pag, next_agino); if (next_ip) goto next_inode; /* Inode not in memory, try reloading it. */ error = xfs_iunlink_reload_next(tp, agibp, prev_agino, next_agino); if (error) break; /* Grab the reloaded inode */ next_ip = xfs_iunlink_lookup(pag, next_agino); if (!next_ip) { /* No incore inode at all, list must be corrupted. */ ASSERT(next_ip != NULL); error = -EFSCORRUPTED; break; } next_inode: prev_agino = next_agino; next_agino = next_ip->i_next_unlinked; } If you think it's better, feel free to use this. Otherwise I can live with the code until I have to touch this code again... ...... > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c > index f225413a993c..ea38d69b9922 100644 > --- a/fs/xfs/xfs_itable.c > +++ b/fs/xfs/xfs_itable.c > @@ -80,6 +80,15 @@ xfs_bulkstat_one_int( > if (error) > goto out; > > + if (xfs_inode_unlinked_incomplete(ip)) { > + error = xfs_inode_reload_unlinked_bucket(tp, ip); > + if (error) { > + xfs_iunlock(ip, XFS_ILOCK_SHARED); > + xfs_irele(ip); > + return error; > + } > + } Same question here about shutdown on error being necessary.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx