Re: [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 29 Mar 2022 17:23:40 +1100

On Tue, Mar 29, 2022 at 10:52:04AM +0530, Chandan Babu R wrote:
> On 25 Mar 2022 at 03:44, Dave Chinner wrote:
> > On Mon, Mar 21, 2022 at 10:47:46AM +0530, Chandan Babu R wrote:
> >> The maximum file size that can be represented by the data fork extent counter
> >> in the worst case occurs when all extents are 1 block in length and each block
> >> is 1KB in size.
> >> 
> >> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
> >> 1KB sized blocks, a file can reach upto,
> >> (2^31) * 1KB = 2TB
> >> 
> >> This is much larger than the theoretical maximum size of a directory
> >> i.e. 32GB * 3 = 96GB.
> >> 
> >> Since a directory's inode can never overflow its data fork extent counter,
> >> this commit replaces checking the return value of
> >> xfs_iext_count_may_overflow() with calls to ASSERT(error == 0).
> >
> > I'd really prefer that we don't add noise like this to a bunch of
> > call sites.  If directories can't overflow the extent count in
> > normal operation, then why are we even calling
> > xfs_iext_count_may_overflow() in these paths? i.e. an overflow would
> > be a sign of an inode corruption, and we should have flagged that
> > long before we do an operation that might overflow the extent count.
> >
> > So, really, I think you should document the directory size
> > constraints at the site where we define all the large extent count
> > values in xfs_format.h, remove the xfs_iext_count_may_overflow()
> > checks from the directory code and replace them with a simple inode
> > verifier check that we haven't got more than 100GB worth of
> > individual extents in the data fork for directory inodes....
> 
> I don't think that we could trivially verify if the extents in a directory's
> data fork add up to more than 96GB.

dip->di_nextents tells us how many extents there are in the data
fork, we know what the block size of the filesystem is, so it should
be pretty easy to calculate a maximum extent count for 96GB of
space. i.e. absolute maximum valid dir data fork extent count
is (96GB / blocksize).

> 
> xfs_dinode->di_size tracks the size of XFS_DIR2_DATA_SPACE. This also includes
> holes that could be created by freeing directory entries in a single directory
> block. Also, there is no easy method to determine the space occupied by
> XFS_DIR2_LEAF_SPACE and XFS_DIR2_FREE_SPACE segments of a directory.

Sure there is. We do this sort of calc for things like transaction
reservations via definitions like XFS_DA_NODE_MAXDEPTH. That tells us
immediately how many blocks can be in the XFS_DIR2_LEAF_SPACE
segement....

We also know the maximum number of individual directory blocks in
the 32GB segment (fixed at 32GB / dir block size), so the free space
array is also a fixed size at (32GB / dir block size / free space
entries per block).

It's easy to just use (96GB / block size) and that will catch most
corruptions with no risk of a false positive detection, but we could
quite easily refine this to something like:

data	(32GB +				
leaf	 btree blocks(XFS_DA_NODE_MAXDEPTH) +
freesp	 (32GB / free space records per block))
frags					/ filesystem block size

> May be the following can be added to xfs_dinode_verify(),
> 
> 	if (S_ISDIR(mode) && ((xfs_dinode->di_size + 2 * 32GB) > 96GB))
>     		return __this_address

That doesn't validate that the on disk or in-memory di_nextents
value is withing the known valid range or not. We can do that
directly (as per above), so we shouldn't need a hueristic like this.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx