On Thu, Sep 07, 2023 at 05:02:12PM +1000, Dave Chinner wrote: > On Sun, Sep 03, 2023 at 09:15:59AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > The previous patch to reload unrecovered unlinked inodes when adding a > > newly created inode to the unlinked list is missing a key piece of > > functionality. It doesn't handle the case that someone calls xfs_iget > > on an inode that is not the last item in the incore list. For example, > > if at mount time the ondisk iunlink bucket looks like this: > > > > AGI -> 7 -> 22 -> 3 -> NULL > > > > None of these three inodes are cached in memory. Now let's say that > > someone tries to open inode 3 by handle. We need to walk the list to > > make sure that inodes 7 and 22 get loaded cold, and that the > > i_prev_unlinked of inode 3 gets set to 22. > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_export.c | 6 +++ > > fs/xfs/xfs_inode.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_inode.h | 9 +++++ > > fs/xfs/xfs_itable.c | 9 +++++ > > fs/xfs/xfs_trace.h | 20 ++++++++++ > > 5 files changed, 144 insertions(+) > > > > > > diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c > > index 1064c2342876..f71ea786a6d2 100644 > > --- a/fs/xfs/xfs_export.c > > +++ b/fs/xfs/xfs_export.c > > @@ -146,6 +146,12 @@ xfs_nfs_get_inode( > > return ERR_PTR(error); > > } > > > > + error = xfs_inode_reload_unlinked(ip); > > + if (error) { > > + xfs_irele(ip); > > + return ERR_PTR(error); > > + } > > We don't want to be creating an empty transaction, locking the inode > and then cancelling the transaction having done nothing on every > NFS handle we have to resolve. We only want to do this if the link > count is zero and i_prev_unlinked == 0, at which point we can then > take the slow path (i.e call xfs_inode_reload_unlinked()) to reload > the unlinked list. Something like: Hmm. Using an unlocked check here makes me nervous, since we could be racing with another thread that is unlinking the inode. OTOH, this code doesn't start *changing* any incore state until it's taken ILOCK_SHARED and locked the AGI buffer to stabilize @ip's unlinked list. So. I think it's ok to do the unlocked check here like you propose: > if (xfs_inode_unlinked_incomplete(ip)) { > error = xfs_inode_reload_unlinked(ip); > if (error) { > xfs_irele(ip); > return ERR_PTR(error); > } > } Though I want to add one more xfs_inode_unlinked_incomplete call inside xfs_inode_reload_unlinked_bucket to skip the list walk if we take the locks only to find that someone else fixed it up for us. > Hmmmm. if i_nlink is zero on ip and we call xfs_irele() on it, the > it will be punted straight to inactivation, which will try to remove > it from the unlinked list and free it. But we just failed to reload > the unlinked list, so inactivation is going to go badly... > > So on reload error, shouldn't we be shutting down the filesystem > because we know we cannot safely remove this inode from the unlinked > list and free it? I think the iunlink lookup failure will take down the filesystem for us when inactivation runs on @ip, but you're right, we should shut down immediately so that the stack trace will show xfs_nfs_get_inode and not just a background thread. > > + > > if (VFS_I(ip)->i_generation != generation) { > > xfs_irele(ip); > > return ERR_PTR(-ESTALE); > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index 6cd2f29b540a..56f6bde6001b 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -3607,3 +3607,103 @@ xfs_iunlock2_io_mmap( > > if (ip1 != ip2) > > inode_unlock(VFS_I(ip1)); > > } > > + > > +/* > > + * Reload the incore inode list for this inode. Caller should ensure that > > + * the link count cannot change, either by taking ILOCK_SHARED or otherwise > > + * preventing other threads from executing. > > + */ > > +int > > +xfs_inode_reload_unlinked_bucket( > > + struct xfs_trans *tp, > > + struct xfs_inode *ip) > > +{ > > + struct xfs_mount *mp = tp->t_mountp; > > + struct xfs_buf *agibp; > > + struct xfs_agi *agi; > > + struct xfs_perag *pag; > > + xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, ip->i_ino); > > + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); > > + xfs_agino_t prev_agino, next_agino; > > + unsigned int bucket; > > + bool foundit = false; > > + int error; > > + > > + /* Grab the first inode in the list */ > > + pag = xfs_perag_get(mp, agno); > > + error = xfs_ialloc_read_agi(pag, tp, &agibp); > > + xfs_perag_put(pag); > > + if (error) > > + return error; > > + > > + bucket = agino % XFS_AGI_UNLINKED_BUCKETS; > > + agi = agibp->b_addr; > > + > > + trace_xfs_inode_reload_unlinked_bucket(ip); > > + > > + xfs_info_ratelimited(mp, > > + "Found unrecovered unlinked inode 0x%x in AG 0x%x. Initiating list recovery.", > > + agino, agno); > > + > > + prev_agino = NULLAGINO; > > + next_agino = be32_to_cpu(agi->agi_unlinked[bucket]); > > + while (next_agino != NULLAGINO) { > > + struct xfs_inode *next_ip = NULL; > > + > > + if (next_agino == agino) { > > + /* Found this inode, set its backlink. */ > > + next_ip = ip; > > + next_ip->i_prev_unlinked = prev_agino; > > + foundit = true; > > + } > > + if (!next_ip) { > > + /* Inode already in memory. */ > > + next_ip = xfs_iunlink_lookup(pag, next_agino); > > + } > > + if (!next_ip) { > > + /* Inode not in memory, reload. */ > > + error = xfs_iunlink_reload_next(tp, agibp, prev_agino, > > + next_agino); > > + if (error) > > + break; > > + > > + next_ip = xfs_iunlink_lookup(pag, next_agino); > > + } > > + if (!next_ip) { > > + /* No incore inode at all? We reloaded it... */ > > + ASSERT(next_ip != NULL); > > + error = -EFSCORRUPTED; > > + break; > > + } > > Not a great fan of this logic construct, nor am I great fan of > asking you to restructure code for vanity reasons. However, I do > think a goto improves it quite a lot: > > while (next_agino != NULLAGINO) { > struct xfs_inode *next_ip = NULL; > > if (next_agino == agino) { > /* Found this inode, set its backlink. */ > next_ip = ip; > next_ip->i_prev_unlinked = prev_agino; > foundit = true; > goto next_inode; > } > > /* Try in-memory lookup first. */ > next_ip = xfs_iunlink_lookup(pag, next_agino); > if (next_ip) > goto next_inode; > > /* Inode not in memory, try reloading it. */ > error = xfs_iunlink_reload_next(tp, agibp, prev_agino, > next_agino); > if (error) > break; > > /* Grab the reloaded inode */ > next_ip = xfs_iunlink_lookup(pag, next_agino); > if (!next_ip) { > /* No incore inode at all, list must be corrupted. */ > ASSERT(next_ip != NULL); > error = -EFSCORRUPTED; > break; > } > next_inode: > prev_agino = next_agino; > next_agino = next_ip->i_next_unlinked; > } > > If you think it's better, feel free to use this. Otherwise I can > live with the code until I have to touch this code again... > > ...... I'll use it, thanks. It /does/ make the loop body easier to understand since there are fewer "uhh when *is* next_ip null?" questions. > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c > > index f225413a993c..ea38d69b9922 100644 > > --- a/fs/xfs/xfs_itable.c > > +++ b/fs/xfs/xfs_itable.c > > @@ -80,6 +80,15 @@ xfs_bulkstat_one_int( > > if (error) > > goto out; > > > > + if (xfs_inode_unlinked_incomplete(ip)) { > > + error = xfs_inode_reload_unlinked_bucket(tp, ip); > > + if (error) { > > + xfs_iunlock(ip, XFS_ILOCK_SHARED); > > + xfs_irele(ip); > > + return error; > > + } > > + } > > Same question here about shutdown on error being necessary.... Will add a xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE) call here too. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx