Re: [PATCH V6 05/13] xfs: Check for extent overflow when growing realtime bitmap/summary inodes

Chandan Babu R <chandanrlinux@xxxxxxxxx> · Wed, 24 Mar 2021 16:16:48 +0530

On 24 Mar 2021 at 02:27, Darrick J. Wong wrote:
> On Tue, Mar 23, 2021 at 09:21:27PM +0530, Chandan Babu R wrote:
>> On 22 Mar 2021 at 23:26, Darrick J. Wong wrote:
>> > On Tue, Mar 09, 2021 at 10:31:16AM +0530, Chandan Babu R wrote:
>> >> Verify that XFS does not cause realtime bitmap/summary inode fork's
>> >> extent count to overflow when growing the realtime volume associated
>> >> with a filesystem.
>> >>
>> >> Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
>> >> Signed-off-by: Chandan Babu R <chandanrlinux@xxxxxxxxx>
>> >
>> > Soo... I discovered that this test doesn't pass with multiblock
>> > directories:
>>
>> Thanks for the bug report and the description of the corresponding solution. I
>> am fixing the tests and will soon post corresponding patches to the mailing
>> list.
>
> Also, I found a problem with xfs/534 when it does the direct write tests
> to a pmem volume with DAX enabled:
>
> --- /tmp/fstests/tests/xfs/534.out      2021-03-21 11:44:09.384407426 -0700
> +++ /var/tmp/fstests/xfs/534.out.bad    2021-03-23 13:32:15.898301839 -0700
> @@ -5,7 +5,4 @@
>  Fallocate 15 blocks
>  Buffered write to every other block of fallocated space
>  Verify $testfile's extent count
> -* Direct write to unwritten extent
> -Fallocate 15 blocks
> -Direct write to every other block of fallocated space
> -Verify $testfile's extent count
> +Extent count overflow check failed: nextents = 11

The inode extent overflow reported above was actually due to the buffered
write operation. But it does occur with direct write operation as well.

I was able to recreate the bug with an emulated pmem device on my qemu guest.

>
> looking at the xfs_bmap output for $testfile shows:
>
> /opt/testfile:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..7]:          208..215          0 (208..215)           8 010000
>    1: [8..15]:         216..223          0 (216..223)           8 000000
>    2: [16..23]:        224..231          0 (224..231)           8 010000
>    3: [24..31]:        232..239          0 (232..239)           8 000000
>    4: [32..39]:        240..247          0 (240..247)           8 010000
>    5: [40..47]:        248..255          0 (248..255)           8 000000
>    6: [48..55]:        256..263          0 (256..263)           8 010000
>    7: [56..63]:        264..271          0 (264..271)           8 000000
>    8: [64..71]:        272..279          0 (272..279)           8 010000
>    9: [72..79]:        280..287          0 (280..287)           8 000000
>   10: [80..119]:       288..327          0 (288..327)          40 010000
>
> Which is ... odd since the same direct write gets cut off after writing
> to block 7 (like you'd expect since it's the same function) when DAX
> isn't enabled...
>
> ...OH, I see the problem.  For a non-DAX direct write,
> xfs_iomap_write_direct will allocate an unwritten block into a hole, but
> if the block was already mapped (written or unwritten) it won't do
> anything at all.  For that case, XFS_IEXT_ADD_NOSPLIT_CNT is sufficient,
> because in the worst case we add one extent to the data fork.
>
> For DAX writes, however, the behavior is different:
>
> 	if (IS_DAX(VFS_I(ip))) {
> 		bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
> 		if (imap->br_state == XFS_EXT_UNWRITTEN) {
> 			force = true;
> 			dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0) << 1;
> 		}
> 	}
>
> This tells xfs_bmapi_write that we want to /convert/ an unwritten extent
> to written, and we want to zero the blocks.  If we're dax-writing into
> the middle of an unwritten range, this will cause a split.  The correct
> parameter there would be XFS_IEXT_WRITE_UNWRITTEN_CNT.  Would you mind
> sending a kernel patch to fix that?

Sure, I will work on fixing both the buffered and direct IO extent overflow
issues.

Thanks for reporting the bug.

--
chandan