Re: [PATCH V3 09/12] xfs: Enable bulkstat ioctl to support 64-bit per-inode extent counters

Chandan Babu R <chandan.babu@xxxxxxxxxx> · Wed, 29 Sep 2021 22:34:55 +0530

On 29 Sep 2021 at 05:09, Dave Chinner wrote:
> On Tue, Sep 28, 2021 at 03:19:29PM +0530, Chandan Babu R wrote:
>> On 28 Sep 2021 at 04:36, Dave Chinner wrote:
>> > On Thu, Sep 16, 2021 at 03:36:44PM +0530, Chandan Babu R wrote:
>> >> @@ -492,9 +494,16 @@ struct xfs_bulk_ireq {
>> >>   */
>> >>  #define XFS_BULK_IREQ_METADIR	(1 << 2)
>> >>  
>> >> -#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO | \
>> >> +#define XFS_BULK_IREQ_BULKSTAT	(1 << 3)
>> >> +
>> >> +#define XFS_BULK_IREQ_FLAGS_ALL	(XFS_BULK_IREQ_AGNO |	 \
>> >>  				 XFS_BULK_IREQ_SPECIAL | \
>> >> -				 XFS_BULK_IREQ_METADIR)
>> >> +				 XFS_BULK_IREQ_METADIR | \
>> >> +				 XFS_BULK_IREQ_BULKSTAT)
>> >
>> > What's this XFS_BULK_IREQ_METADIR thing? I haven't noticed that when
>> > scanning any recent proposed patch series....
>> >
>> 
>> XFS_BULK_IREQ_METADIR is from Darrick's tree. His "Kill XFS_BTREE_MAXLEVELS"
>> patch series is based on his other patchsets. His recent "xfs: support dynamic
>> btree cursor height" patch series rebases only the required patchset on top of
>> v5.15-rc1 kernel eliminating the others.
>
> OK, so how much testing has this had on just a straight v5.15-rcX
> kernel?
>

I haven't yet tested this patchset on v5.15-rcX yet. I will have to rebase my
patchset on top of Darrick's patchset and also would require xfsprogs' version
of "xfs: support dynamic btree cursor height".

>> >> @@ -134,7 +136,26 @@ xfs_bulkstat_one_int(
>> >>  
>> >>  	buf->bs_xflags = xfs_ip2xflags(ip);
>> >>  	buf->bs_extsize_blks = ip->i_extsize;
>> >> -	buf->bs_extents = xfs_ifork_nextents(&ip->i_df);
>> >> +
>> >> +	nextents = xfs_ifork_nextents(&ip->i_df);
>> >> +	if (!(bc->breq->flags & XFS_IBULK_NREXT64)) {
>> >> +		xfs_extnum_t max_nextents = XFS_IFORK_EXTCNT_MAXS32;
>> >> +
>> >> +		if (unlikely(XFS_TEST_ERROR(false, mp,
>> >> +				XFS_ERRTAG_REDUCE_MAX_IEXTENTS)))
>> >> +			max_nextents = 10;
>> >> +
>> >> +		if (nextents > max_nextents) {
>> >> +			xfs_iunlock(ip, XFS_ILOCK_SHARED);
>> >> +			xfs_irele(ip);
>> >> +			error = -EINVAL;
>> >> +			goto out_advance;
>> >> +		}
>> >
>> > So we return an EINVAL error if any extent overflows the 32 bit
>> > counter? Why isn't this -EOVERFLOW?
>> >
>> 
>> Returning -EINVAL causes xfs_bulkstat_iwalk() to skip inodes whose extent
>> count is larger than that which can be fitted into a 32-bit field. Returning
>> -EOVERFLOW causes the bulkstat ioctl to stop reporting remaining inodes.
>
> Ok, that's a bad behaviour we need to fix because it will cause
> things like old versions of xfs_dump to miss inodes that
> have overflowing extent counts. i.e. it will cause incomplete
> backups, and the failure will likely be silent.
>
> I asked about -EOVERFLOW because that's what stat() returns when an
> inode attribute value doesn't fit in the stat_buf field (e.g. 64 bit
> inode number on 32 bit kernel), and if we are overflowing the
> bulkstat field then we really should be telling userspace that an
> overflow occurred.
>
> /me has a sudden realisation that the xfs_dump format may not
> support large extents counts and goes looking...
>
> Yeah, xfsdump doesn't support extent counts greater than 2^32. So
> that means we really do need -EOVERFLOW errors here.  i.e, if we get
> an extent count overflow with a !(bc->breq->flags &
> XFS_IBULK_NREXT64) bulkstat walk, xfs_dump needs bulkstat to fill
> out the inode with the overflow with all the fileds that aren't
> overflowed, then error out with -EOVERFLOW.
>
> Bulkstat itself should not silently skip the inode because it would
> overflow a field in the struct xfs-bstat - the decision of what to
> do with the overflow is something xfsdump needs to handle, not the
> kernel.  Hence we need to return -EOVERFLOW here so that userspace
> can decide what to do with an inode it can't handle...
>

Ok. I had never thought of xfsdump use case. I will fix this issue as
well.

I guess adding ability to xfsdump to work with 64-bit extent counters can be
done after I address all the issues pointed out with the current patchset.

Thanks a lot for reviewing this patchset.

-- 
chandan