On Thu, Oct 10, 2024 at 05:49:40PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Create a xfs_metafile_iget function for metadata inodes to ensure that > when we try to iget a metadata file, the inobt thinks a metadata inode > is in use and that the metadata type matches what we are expecting. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > --- ..... > +/* > + * Get a metadata inode. The metafile @type must match the inode exactly. > + * Caller must supply a transaction (even if empty) to avoid livelocking if the > + * inobt has a cycle. Is this something we have to be concerned with for normal operation? We don't care about needing to detect inobt cycles this when doing lookups from the VFS during normal operation, so why is this an issue we have to be explicitly concerned about during metadir traversals? Additionally, I can see how a corrupt btree ptr loop could cause a *deadlock* without a transaction context (i.e. end up waiting on a lock we already hold) but I don't see what livelock the transaction context prevents. It appears to me that it would only turn the deadlock into a livelock because the second buffer lookup will find the locked buffer already linked to the transaction and simply take another reference to it. Hence it will just run around the loop of buffers taking references forever (i.e. a livelock) instead of deadlocking. Another question: why are we only concerned cycles in the inobt? If we've got a cycle in any other btree the metadir code might interact with (e.g. directories, free space, etc), we're going to have the same problems with deadlocks and/or livelocks on tree traversal. > + */ > +int > +xfs_trans_metafile_iget( > + struct xfs_trans *tp, > + xfs_ino_t ino, > + enum xfs_metafile_type metafile_type, > + struct xfs_inode **ipp) > +{ > + struct xfs_mount *mp = tp->t_mountp; > + struct xfs_inode *ip; > + umode_t mode; > + int error; > + > + error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &ip); Along a similar line: why don't we trust our own internal inode references to be correct and valid? These aren't structures that are externally visible or accessible, so the only thing that should be interacting with metadir inodes is trusted kernel code. At minimum, this needs a big comment explaining why we can't trust our metadir structure and why this is different to normal inode lookup done for inodes accessed by path traversal from the root dir. Also, doing a trusted lookup means we don't have to walk the inobt during the inode lookup, and so the deadlock/livelock problem goes away.... > @@ -1133,45 +1130,54 @@ xfs_rtmount_iread_extents( > * Get the bitmap and summary inodes and the summary cache into the mount > * structure at mount time. > */ > -int /* error */ > +int > xfs_rtmount_inodes( > - xfs_mount_t *mp) /* file system mount structure */ > + struct xfs_mount *mp) > { > - int error; /* error return value */ > - xfs_sb_t *sbp; > + struct xfs_trans *tp; > + struct xfs_sb *sbp = &mp->m_sb; > + int error; > > - sbp = &mp->m_sb; > - error = xfs_iget(mp, NULL, sbp->sb_rbmino, 0, 0, &mp->m_rbmip); .... and it's clear that we currently treat the superblock inodes as trusted inode numbers. We don't validate that they are allocated when we look them up, we trust that they have been correctly allocated and properly referenced on disk. It's not clear to me why there's an undocumented change of trust for internal, trusted access only on-disk metadata being made here... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx