Re: [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters

Chandan Babu R <chandan.babu@xxxxxxxxxx> · Tue, 15 Feb 2022 17:03:17 +0530

On 15 Feb 2022 at 15:03, Dave Chinner wrote:
> On Tue, Feb 15, 2022 at 12:18:50PM +0530, Chandan Babu R wrote:
>> On 14 Feb 2022 at 22:37, Darrick J. Wong wrote:
>> > On Fri, Feb 11, 2022 at 05:40:30PM +0530, Chandan Babu R wrote:
>> >> On 07 Feb 2022 at 22:41, Darrick J. Wong wrote:
>> >> > On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote:
>> >> >> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote:
>> >> >> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote:
>> >> >> I went through all the call sites of xfs_iext_count_may_overflow() and I think
>> >> >> that your suggestion can be implemented.
>> >> 
>> >> Sorry, I missed/overlooked the usage of xfs_iext_count_may_overflow() in
>> >> xfs_symlink().
>> >> 
>> >> Just after invoking xfs_iext_count_may_overflow(), we execute the following
>> >> steps,
>> >> 
>> >> 1. Allocate inode chunk
>> >> 2. Initialize inode chunk.
>> >> 3. Insert record into inobt/finobt.
>> >> 4. Roll the transaction.
>> >> 5. Allocate ondisk inode.
>> >> 6. Add directory inode to transaction.
>> >> 7. Allocate blocks to store symbolic link path name.
>> >> 8. Log symlink's inode (data fork contains block mappings).
>> >> 9. Log data blocks containing symbolic link path name.
>> >> 10. Add name to directory and log directory's blocks.
>> >> 11. Log directory inode.
>> >> 12. Commit transaction.
>> >> 
>> >> xfs_trans_roll() invoked in step 4 would mean that we cannot move step 6 to
>> >> occur before step 1 since xfs_trans_roll would unlock the inode by executing
>> >> xfs_inode_item_committing().
>> >> 
>> >> xfs_create() has a similar flow.
>> >> 
>> >> Hence, I think we should retain the current logic of setting
>> >> XFS_DIFLAG2_NREXT64 just after reading the inode from the disk.
>> >
>> > File creation shouldn't ever run into problems with
>> > xfs_iext_count_may_overflow because (a) only symlinks get created with
>> > mapped blocks, and never more than two; and (b) we always set NREXT64
>> > (the inode flag) on new files if NREXT64 (the superblock feature bit) is
>> > enabled, so a newly created file will never require upgrading.
>> 
>> The inode representing the symbolic link being created cannot overflow its
>> data fork extent count field. However, the inode representing the directory
>> inside which the symbolic link entry is being created, might overflow its data
>> fork extent count field.
>
> I dont' think that can happen. A directory is limited in size to 3
> segments of 32GB each. In reality, only the data segment can ever
> reach 32GB as both the dabtree and free space segments are just
> compact indexes of the contents of the 32GB data segment.
>
> Hence a directory is never likely to reach more than about 40GB of
> blocks which is nowhere near large enough to overflowing a 32 bit
> extent count field.

I think you are right.

The maximum file size that can be represented by the data fork extent counter
in the worst case occurs when all extents are 1 block in size and each block
is 1k in size.

With 1k byte sized blocks, a file can reach upto,
1k * (2^31) = 2048 GB

This is much larger than the asymptotic maximum size of a directory i.e.
32GB * 3 = 96GB.

-- 
chandan