Re: [PATCH 05/28] xfs: iget for metadata inodes

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 15 Oct 2024 17:55:23 +1100

On Thu, Oct 10, 2024 at 05:49:40PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Create a xfs_metafile_iget function for metadata inodes to ensure that
> when we try to iget a metadata file, the inobt thinks a metadata inode
> is in use and that the metadata type matches what we are expecting.
> 
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> ---
.....

> +/*
> + * Get a metadata inode.  The metafile @type must match the inode exactly.
> + * Caller must supply a transaction (even if empty) to avoid livelocking if the
> + * inobt has a cycle.

Is this something we have to be concerned with for normal operation?
We don't care about needing to detect inobt cycles this when doing
lookups from the VFS during normal operation, so why is this an
issue we have to be explicitly concerned about during metadir
traversals?

Additionally, I can see how a corrupt btree ptr loop could cause a
*deadlock* without a transaction context (i.e. end up waiting on a
lock we already hold) but I don't see what livelock the transaction
context prevents. It appears to me that it would only turn the
deadlock into a livelock because the second buffer lookup will find
the locked buffer already linked to the transaction and simply take
another reference to it.  Hence it will just run around the loop of
buffers taking references forever (i.e. a livelock) instead of
deadlocking.

Another question: why are we only concerned cycles in the inobt? If
we've got a cycle in any other btree the metadir code might interact
with (e.g. directories, free space, etc), we're going to have the
same problems with deadlocks and/or livelocks on tree traversal. 

> + */
> +int
> +xfs_trans_metafile_iget(
> +	struct xfs_trans	*tp,
> +	xfs_ino_t		ino,
> +	enum xfs_metafile_type	metafile_type,
> +	struct xfs_inode	**ipp)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_inode	*ip;
> +	umode_t			mode;
> +	int			error;
> +
> +	error = xfs_iget(mp, tp, ino, XFS_IGET_UNTRUSTED, 0, &ip);

Along a similar line: why don't we trust our own internal inode
references to be correct and valid? These aren't structures that are
externally visible or accessible, so the only thing that should be
interacting with metadir inodes is trusted kernel code.

At minimum, this needs a big comment explaining why we can't trust
our metadir structure and why this is different to normal inode
lookup done for inodes accessed by path traversal from the root
dir.

Also, doing a trusted lookup means we don't have to walk the inobt
during the inode lookup, and so the deadlock/livelock problem goes
away....

> @@ -1133,45 +1130,54 @@ xfs_rtmount_iread_extents(
>   * Get the bitmap and summary inodes and the summary cache into the mount
>   * structure at mount time.
>   */
> -int					/* error */
> +int
>  xfs_rtmount_inodes(
> -	xfs_mount_t	*mp)		/* file system mount structure */
> +	struct xfs_mount	*mp)
>  {
> -	int		error;		/* error return value */
> -	xfs_sb_t	*sbp;
> +	struct xfs_trans	*tp;
> +	struct xfs_sb		*sbp = &mp->m_sb;
> +	int			error;
>  
> -	sbp = &mp->m_sb;
> -	error = xfs_iget(mp, NULL, sbp->sb_rbmino, 0, 0, &mp->m_rbmip);

.... and it's clear that we currently treat the superblock inodes as
trusted inode numbers. We don't validate that they are allocated
when we look them up, we trust that they have been correctly
allocated and properly referenced on disk.

It's not clear to me why there's an undocumented change of
trust for internal, trusted access only on-disk metadata being made
here...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx