Re: [PATCH V8 15/19] xfs: Directory's data fork extent counter can never overflow

Chandan Babu R <chandan.babu@xxxxxxxxxx> · Wed, 30 Mar 2022 21:09:00 +0530

On 30 Mar 2022 at 09:13, Darrick J. Wong wrote:
> On Tue, Mar 29, 2022 at 05:23:40PM +1100, Dave Chinner wrote:
>> On Tue, Mar 29, 2022 at 10:52:04AM +0530, Chandan Babu R wrote:
>> > On 25 Mar 2022 at 03:44, Dave Chinner wrote:
>> > > On Mon, Mar 21, 2022 at 10:47:46AM +0530, Chandan Babu R wrote:
>> > >> The maximum file size that can be represented by the data fork extent counter
>> > >> in the worst case occurs when all extents are 1 block in length and each block
>> > >> is 1KB in size.
>> > >> 
>> > >> With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
>> > >> 1KB sized blocks, a file can reach upto,
>> > >> (2^31) * 1KB = 2TB
>> > >> 
>> > >> This is much larger than the theoretical maximum size of a directory
>> > >> i.e. 32GB * 3 = 96GB.
>> > >> 
>> > >> Since a directory's inode can never overflow its data fork extent counter,
>> > >> this commit replaces checking the return value of
>> > >> xfs_iext_count_may_overflow() with calls to ASSERT(error == 0).
>> > >
>> > > I'd really prefer that we don't add noise like this to a bunch of
>> > > call sites.  If directories can't overflow the extent count in
>> > > normal operation, then why are we even calling
>> > > xfs_iext_count_may_overflow() in these paths? i.e. an overflow would
>> > > be a sign of an inode corruption, and we should have flagged that
>> > > long before we do an operation that might overflow the extent count.
>> > >
>> > > So, really, I think you should document the directory size
>> > > constraints at the site where we define all the large extent count
>> > > values in xfs_format.h, remove the xfs_iext_count_may_overflow()
>> > > checks from the directory code and replace them with a simple inode
>> > > verifier check that we haven't got more than 100GB worth of
>> > > individual extents in the data fork for directory inodes....
>> > 
>> > I don't think that we could trivially verify if the extents in a directory's
>> > data fork add up to more than 96GB.
>> 
>> dip->di_nextents tells us how many extents there are in the data
>> fork, we know what the block size of the filesystem is, so it should
>> be pretty easy to calculate a maximum extent count for 96GB of
>> space. i.e. absolute maximum valid dir data fork extent count
>> is (96GB / blocksize).
>> 
>> > 
>> > xfs_dinode->di_size tracks the size of XFS_DIR2_DATA_SPACE. This also includes
>> > holes that could be created by freeing directory entries in a single directory
>> > block. Also, there is no easy method to determine the space occupied by
>> > XFS_DIR2_LEAF_SPACE and XFS_DIR2_FREE_SPACE segments of a directory.
>> 
>> Sure there is. We do this sort of calc for things like transaction
>> reservations via definitions like XFS_DA_NODE_MAXDEPTH. That tells us
>
> Hmmm.  Seeing as I just replaced XFS_BTREE_MAXLEVELS with dynamic limits
> set for each filesytem, is XFS_DA_NODE_MAXDEPTH even appropriate for
> modern filesystems?  We're about to start allowing far more extended
> attributes in the form of parent pointers, so we should be careful about
> this.
>
> For a directory, there can be at most 32GB of directory entries, so the
> maximum number of directory entries is...
>
> 32GB / (directory block size) * (max entries per dir block)
>
> The dabtree stores (u32 hash, u32 offset) records, so I guess it
> wouldn't be so hard to compute the number of blocks needed for each node
> level until we only need one block, and that's our real
> XFS_DA_NODE_MAXEPTH.
>
> But then the second question is: what's the maximum height of a dabtree
> that indexes an xattr structure?  I don't think there's any maximum
> limit within XFS on the number of attrs you can set on a file, is there?
> At least until you hit the iext_max_count check.  I think the VFS
> institutes its own limit of 64k for the llistxattr buffer, but that's
> about all I can think of.
>
> I suppose right now the xattr structure can't grow larger than 2^(16+21)
> blocks in size, which is 2^49 bytes, but that's a mix of attr leaves and
> dabtree blocks, unlike directories, right?
>
>> immediately how many blocks can be in the XFS_DIR2_LEAF_SPACE
>> segement....
>> 
>> We also know the maximum number of individual directory blocks in
>> the 32GB segment (fixed at 32GB / dir block size), so the free space
>> array is also a fixed size at (32GB / dir block size / free space
>> entries per block).
>> 
>> It's easy to just use (96GB / block size) and that will catch most
>> corruptions with no risk of a false positive detection, but we could
>> quite easily refine this to something like:
>> 
>> data	(32GB +				
>> leaf	 btree blocks(XFS_DA_NODE_MAXDEPTH) +
>> freesp	 (32GB / free space records per block))
>> frags					/ filesystem block size
>
> I think we ought to do a more careful study of XFS_DA_NODE_MAXDEPTH, but
> it could become more involved than we think.  In the interest of keeping
> this series moving, can we start with a new verifier check that
> (di_nextents < that formula from above) and then refine that based on
> whatever improvements we may or may not come up with for
> XFS_DA_NODE_MAXDEPTH?
>

Are you referring to (dip->di_nextents <= (96GB / blocksize)) check?

-- 
chandan