Re: [PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 5 Dec 2019 13:42:22 -0800

On Thu, Dec 05, 2019 at 09:36:18AM -0500, Brian Foster wrote:
> On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > and swidth values could cause xfs_repair to fail loudly.  The problem
> > here is that repair calculates the where mkfs should have allocated the
> > root inode, based on the superblock geometry.  The allocation decisions
> > depend on sunit, which means that we really can't go updating sunit if
> > it would lead to a subsequent repair failure on an otherwise correct
> > filesystem.
> > 
> > Port the computation code from xfs_repair and teach mount to avoid the
> > ondisk update if it would cause problems for repair.  We allow the mount
> > to proceed (and new allocations will reflect this new geometry) because
> > we've never screened this kind of thing before.
> > 
> > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> > 
> > Reported-by: Alex Lyakas <alex@xxxxxxxxxx>
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > ---
> > v2: compute the root inode location directly
> > ---
> >  fs/xfs/libxfs/xfs_ialloc.c |   81 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_ialloc.h |    1 +
> >  fs/xfs/xfs_mount.c         |   51 ++++++++++++++++++----------
> >  3 files changed, 115 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index 988cde7744e6..6df9bcc96251 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -2909,3 +2909,84 @@ xfs_ialloc_setup_geometry(
> >  	else
> >  		igeo->ialloc_align = 0;
> >  }
> > +
> > +/*
> > + * Compute the location of the root directory inode that is laid out by mkfs.
> > + * The @sunit parameter will be copied from the superblock if it is negative.
> > + */
> > +xfs_ino_t
> > +xfs_ialloc_calc_rootino(
> > +	struct xfs_mount	*mp,
> > +	int			sunit)
> > +{
> > +	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> > +	xfs_agino_t		first_agino;
> > +	xfs_agblock_t		first_bno;
> > +
> > +	if (sunit < 0)
> > +		sunit = mp->m_sb.sb_unit;
> > +
> > +	/*
> > +	 * Pre-calculate the geometry of ag 0. We know what it looks like
> > +	 * because we know what mkfs does: 2 allocation btree roots (by block
> > +	 * and by size), the inode allocation btree root, the free inode
> > +	 * allocation btree root (if enabled) and some number of blocks to
> > +	 * prefill the agfl.
> > +	 *
> > +	 * Because the current shape of the btrees may differ from the current
> > +	 * shape, we open code the mkfs freelist block count here. mkfs creates
> > +	 * single level trees, so the calculation is pretty straight forward for
> > +	 * the trees that use the AGFL.
> > +	 */
> > +
> 
> I know this code is lifted from userspace, but.. "the current shape of
> the btrees may differ from the current shape, .." Eh?

Heh, ok, I'll clean up the comment too.

> > +	/* free space by block btree root comes after the ag headers */
> > +	first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> > +
> > +	/* free space by length btree root */
> > +	first_bno += 1;
> > +
> > +	/* inode btree root */
> > +	first_bno += 1;
> > +
> > +	/* agfl */
> > +	first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
> > +
> 
> This is a little subtle from the userspace code. The extra +1 here is
> where we go from pointing at metadata blocks (i.e. bnobt root) to the
> first free block (i.e., past metadata blocks), right? If so, I wonder if
> this should be incorporated either at the beginning or end with a
> comment for explanation (i.e. "Start by pointing at the first block
> after the AG headers and increment by size of applicable metadata to
> locate the first free block ...").
> 
> I'm guessing this is a historical artifact of the userspace code as
> features were added. The fact that userspace uses different variables
> somewhat helps self-document in that context, and we lose that here.

Yeah.  I'll rework the comments to make it clearer what's going on.

"first_bno is the first block in which we could allocate an inode
chunk after factoring in..."

"...the ag headers"

"...the bnobt / cntbt"

"...the inobt"

etc.

> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > +		first_bno++;
> > +		/* agfl blocks */
> > +		first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
> > +	}
> > +
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > +		first_bno++;
> > +
> > +	/*
> > +	 * If the log is allocated in the first allocation group we need to
> > +	 * add the number of blocks used by the log to the above calculation.
> > +	 *
> > +	 * This can happens with filesystems that only have a single
> > +	 * allocation group, or very odd geometries created by old mkfs
> > +	 * versions on very small filesystems.
> > +	 */
> > +	if (mp->m_sb.sb_logstart &&
> > +	    XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> > +		 first_bno += mp->m_sb.sb_logblocks;
> > +
> > +	/*
> > +	 * ditto the location of the first inode chunks in the fs ('/')
> > +	 */
> > +	if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> > +		first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
> > +	} else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> > +		   mp->m_sb.sb_inoalignmt > 1)  {
> > +		first_agino = XFS_AGB_TO_AGINO(mp,
> > +				roundup(first_bno, mp->m_sb.sb_inoalignmt));
> > +	} else  {
> > +		first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> > +	}
> > +
> > +	return XFS_AGINO_TO_INO(mp, 0, first_agino);
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> > index 323592d563d5..72b3468b97b1 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.h
> > +++ b/fs/xfs/libxfs/xfs_ialloc.h
> > @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
> >  
> >  int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> >  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> > +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
> >  
> >  #endif	/* __XFS_IALLOC_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index fca65109cf24..a4eb3ae34a84 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -363,9 +363,10 @@ xfs_readsb(
> >   * Update alignment values based on mount options and sb values
> >   */
> >  STATIC int
> > -xfs_update_alignment(xfs_mount_t *mp)
> > +xfs_update_alignment(
> > +	struct xfs_mount	*mp)
> >  {
> > -	xfs_sb_t	*sbp = &(mp->m_sb);
> > +	struct xfs_sb		*sbp = &mp->m_sb;
> >  
> >  	if (mp->m_dalign) {
> >  		/*
> > @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
> >  			}
> >  		}
> >  
> > -		/*
> > -		 * Update superblock with new values
> > -		 * and log changes
> > -		 */
> > -		if (xfs_sb_version_hasdalign(sbp)) {
> > -			if (sbp->sb_unit != mp->m_dalign) {
> > -				sbp->sb_unit = mp->m_dalign;
> > -				mp->m_update_sb = true;
> > -			}
> > -			if (sbp->sb_width != mp->m_swidth) {
> > -				sbp->sb_width = mp->m_swidth;
> > -				mp->m_update_sb = true;
> > -			}
> > -		} else {
> > +		/* Update superblock with new values and log changes. */
> > +		if (!xfs_sb_version_hasdalign(sbp)) {
> >  			xfs_warn(mp,
> >  	"cannot change alignment: superblock does not support data alignment");
> >  			return -EINVAL;
> >  		}
> > +
> > +		if (sbp->sb_unit == mp->m_dalign &&
> > +		    sbp->sb_width == mp->m_swidth)
> > +			return 0;
> > +
> > +		/*
> > +		 * If the sunit/swidth change would move the precomputed root
> > +		 * inode value, we must reject the ondisk change because repair
> > +		 * will stumble over that.  However, we allow the mount to
> > +		 * proceed because we never rejected this combination before.
> > +		 */
> > +		if (sbp->sb_rootino !=
> > +		    xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> > +			xfs_warn(mp,
> > +	"cannot change stripe alignment: would require moving root inode");
> > +
> 
> FWIW, I read this error message as the mount option was ignored. I don't
> much care whether we ignore the mount option or simply the on-disk
> update, but the error could be a bit more clear in the latter case.

Ok, I'll add a message about how we're skipping the sb update.

> Also, what is the expected behavior for xfs_info in the latter
> situation?

A previous revision of the patch had the ioctl feeding xfs_info using
the incore values, but Dave objected so I dropped it.

--D

> Brian
> 
> > +			/*
> > +			 * XXX: Next time we add a new incompat feature, this
> > +			 * should start returning -EINVAL.
> > +			 */
> > +			return 0;
> > +		}
> > +
> > +		sbp->sb_unit = mp->m_dalign;
> > +		sbp->sb_width = mp->m_swidth;
> > +		mp->m_update_sb = true;
> >  	} else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
> >  		    xfs_sb_version_hasdalign(&mp->m_sb)) {
> > -			mp->m_dalign = sbp->sb_unit;
> > -			mp->m_swidth = sbp->sb_width;
> > +		mp->m_dalign = sbp->sb_unit;
> > +		mp->m_swidth = sbp->sb_width;
> >  	}
> >  
> >  	return 0;
> > 
>