On 16 Feb 2022 at 09:29, Dave Chinner wrote: > On Tue, Feb 15, 2022 at 05:16:33PM -0800, Darrick J. Wong wrote: >> On Tue, Feb 15, 2022 at 06:46:16PM +0530, Chandan Babu R wrote: >> > On 15 Feb 2022 at 17:03, Chandan Babu R wrote: >> > > On 15 Feb 2022 at 15:03, Dave Chinner wrote: >> > >> On Tue, Feb 15, 2022 at 12:18:50PM +0530, Chandan Babu R wrote: >> > >>> On 14 Feb 2022 at 22:37, Darrick J. Wong wrote: >> > >>> > On Fri, Feb 11, 2022 at 05:40:30PM +0530, Chandan Babu R wrote: >> > >>> >> On 07 Feb 2022 at 22:41, Darrick J. Wong wrote: >> > >>> >> > On Mon, Feb 07, 2022 at 10:25:19AM +0530, Chandan Babu R wrote: >> > >>> >> >> On 02 Feb 2022 at 01:31, Darrick J. Wong wrote: >> > >>> >> >> > On Fri, Jan 21, 2022 at 10:48:54AM +0530, Chandan Babu R wrote: >> > >>> >> >> I went through all the call sites of xfs_iext_count_may_overflow() and I think >> > >>> >> >> that your suggestion can be implemented. >> > >>> >> >> > >>> >> Sorry, I missed/overlooked the usage of xfs_iext_count_may_overflow() in >> > >>> >> xfs_symlink(). >> > >>> >> >> > >>> >> Just after invoking xfs_iext_count_may_overflow(), we execute the following >> > >>> >> steps, >> > >>> >> >> > >>> >> 1. Allocate inode chunk >> > >>> >> 2. Initialize inode chunk. >> > >>> >> 3. Insert record into inobt/finobt. >> > >>> >> 4. Roll the transaction. >> > >>> >> 5. Allocate ondisk inode. >> > >>> >> 6. Add directory inode to transaction. >> > >>> >> 7. Allocate blocks to store symbolic link path name. >> > >>> >> 8. Log symlink's inode (data fork contains block mappings). >> > >>> >> 9. Log data blocks containing symbolic link path name. >> > >>> >> 10. Add name to directory and log directory's blocks. >> > >>> >> 11. Log directory inode. >> > >>> >> 12. Commit transaction. >> > >>> >> >> > >>> >> xfs_trans_roll() invoked in step 4 would mean that we cannot move step 6 to >> > >>> >> occur before step 1 since xfs_trans_roll would unlock the inode by executing >> > >>> >> xfs_inode_item_committing(). >> > >>> >> >> > >>> >> xfs_create() has a similar flow. >> > >>> >> >> > >>> >> Hence, I think we should retain the current logic of setting >> > >>> >> XFS_DIFLAG2_NREXT64 just after reading the inode from the disk. >> > >>> > >> > >>> > File creation shouldn't ever run into problems with >> > >>> > xfs_iext_count_may_overflow because (a) only symlinks get created with >> > >>> > mapped blocks, and never more than two; and (b) we always set NREXT64 >> > >>> > (the inode flag) on new files if NREXT64 (the superblock feature bit) is >> > >>> > enabled, so a newly created file will never require upgrading. >> > >>> >> > >>> The inode representing the symbolic link being created cannot overflow its >> > >>> data fork extent count field. However, the inode representing the directory >> > >>> inside which the symbolic link entry is being created, might overflow its data >> > >>> fork extent count field. >> > >> >> > >> I dont' think that can happen. A directory is limited in size to 3 >> > >> segments of 32GB each. In reality, only the data segment can ever >> > >> reach 32GB as both the dabtree and free space segments are just >> > >> compact indexes of the contents of the 32GB data segment. >> > >> >> > >> Hence a directory is never likely to reach more than about 40GB of >> > >> blocks which is nowhere near large enough to overflowing a 32 bit >> > >> extent count field. >> > > >> > > I think you are right. >> > > >> > > The maximum file size that can be represented by the data fork extent counter >> > > in the worst case occurs when all extents are 1 block in size and each block >> > > is 1k in size. >> > > >> > > With 1k byte sized blocks, a file can reach upto, >> > > 1k * (2^31) = 2048 GB >> > > >> > > This is much larger than the asymptotic maximum size of a directory i.e. >> > > 32GB * 3 = 96GB. >> >> The downside of getting rid of the checks for directories is that we >> won't be able to use the error injection knob that limits all forks to >> 10 extents max to see what happens when that part of directory expansion >> fails. But if it makes it easier to handle nrext64, then that's >> probably a good enough reason to forego that. > > If you want error injection to do that, add explicit error injection > to the directory code. The transaction might already be dirty before entering the directory code (e.g. xfs_dir_createname()). In this case, an error return from xfs_iext_count_may_overflow() will cause the filesystem to be shut down. On the other hand, removing calls to xfs_iext_count_may_overflow() from the previously listed directory functions would result in the error injection knob to not work for directories. This would require us to delete xfs/533 test. Leaving the current invocations of xfs_iext_count_may_overflow() in their respective locations would mean that they are essentially no-ops for functions which manipulate directories. However, with functions like xfs_symlink() and xfs_create(), I wouldn't be able to add the inode to the transaction before invoking xfs_iext_count_may_overflow() because this leads to inode being unlocked when rolling the transaction. Therefore I think we should not change the current code flow w.r.t to functions associated with directory entry manipulation. i.e. 1. Let xfs_iext_count_may_overflow() continue to be no-op w.r.t directory manipulation. 2. Since xfs_iext_count_may_overflow() is a no-op, there is no need to move "add inode to transaction" code to occur before invoking xfs_iext_count_may_overflow(). > >> > xfs_bmap_del_extent_real() >> >> Not sure about this one, since it actually /can/ result in more extents. > > Yup, unlikely to ever trigger, but still necessary for correctness. > -- chandan