Re: [PATCH V5 13/16] xfs: Conditionally upgrade existing inodes to use 64-bit extent counters

Chandan Babu R <chandan.babu@xxxxxxxxxx> · Tue, 15 Feb 2022 18:46:16 +0530

On 15 Feb 2022 at 17:03, Chandan Babu R wrote:
> On 15 Feb 2022 at 15:03, Dave Chinner wrote:
>> On Tue, Feb 15, 2022 at 12:18:50PM +0530, Chandan Babu R wrote:
>>> On 14 Feb 2022 at 22:37, Darrick J. Wong wrote:
>>> > On Fri, Feb 11, 2022 at 05:40:30PM +0530, Chandan Babu R wrote:
>>> >> On 07 Feb 2022 at 22:41, Darrick J. Wong wrote:
>>> >> > On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote:
>>> >> >> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote:
>>> >> >> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote:
>>> >> >> I went through all the call sites of xfs_iext_count_may_overflow() and I think
>>> >> >> that your suggestion can be implemented.
>>> >> 
>>> >> Sorry, I missed/overlooked the usage of xfs_iext_count_may_overflow() in
>>> >> xfs_symlink().
>>> >> 
>>> >> Just after invoking xfs_iext_count_may_overflow(), we execute the following
>>> >> steps,
>>> >> 
>>> >> 1. Allocate inode chunk
>>> >> 2. Initialize inode chunk.
>>> >> 3. Insert record into inobt/finobt.
>>> >> 4. Roll the transaction.
>>> >> 5. Allocate ondisk inode.
>>> >> 6. Add directory inode to transaction.
>>> >> 7. Allocate blocks to store symbolic link path name.
>>> >> 8. Log symlink's inode (data fork contains block mappings).
>>> >> 9. Log data blocks containing symbolic link path name.
>>> >> 10. Add name to directory and log directory's blocks.
>>> >> 11. Log directory inode.
>>> >> 12. Commit transaction.
>>> >> 
>>> >> xfs_trans_roll() invoked in step 4 would mean that we cannot move step 6 to
>>> >> occur before step 1 since xfs_trans_roll would unlock the inode by executing
>>> >> xfs_inode_item_committing().
>>> >> 
>>> >> xfs_create() has a similar flow.
>>> >> 
>>> >> Hence, I think we should retain the current logic of setting
>>> >> XFS_DIFLAG2_NREXT64 just after reading the inode from the disk.
>>> >
>>> > File creation shouldn't ever run into problems with
>>> > xfs_iext_count_may_overflow because (a) only symlinks get created with
>>> > mapped blocks, and never more than two; and (b) we always set NREXT64
>>> > (the inode flag) on new files if NREXT64 (the superblock feature bit) is
>>> > enabled, so a newly created file will never require upgrading.
>>> 
>>> The inode representing the symbolic link being created cannot overflow its
>>> data fork extent count field. However, the inode representing the directory
>>> inside which the symbolic link entry is being created, might overflow its data
>>> fork extent count field.
>>
>> I dont' think that can happen. A directory is limited in size to 3
>> segments of 32GB each. In reality, only the data segment can ever
>> reach 32GB as both the dabtree and free space segments are just
>> compact indexes of the contents of the 32GB data segment.
>>
>> Hence a directory is never likely to reach more than about 40GB of
>> blocks which is nowhere near large enough to overflowing a 32 bit
>> extent count field.
>
> I think you are right.
>
> The maximum file size that can be represented by the data fork extent counter
> in the worst case occurs when all extents are 1 block in size and each block
> is 1k in size.
>
> With 1k byte sized blocks, a file can reach upto,
> 1k * (2^31) = 2048 GB
>
> This is much larger than the asymptotic maximum size of a directory i.e.
> 32GB * 3 = 96GB.

Also, I think I should remove extent count overflow checks performed in the
following functions,

xfs_create()
xfs_rename()
xfs_link()
xfs_symlink()
xfs_bmap_del_extent_real()

... Since they do not accomplish anything.

Please let me know your views on this.

-- 
chandan