Re: [PATCH v3] fstests: btrfs: Test if btrfs will corrupt nodatasum compressed extent when replacing device

Qu Wenruo <wqu@xxxxxxx> · Thu, 14 Jun 2018 15:23:44 +0800

On 2018年06月14日 15:09, Nikolay Borisov wrote:
> 
> 
> On 14.06.2018 10:04, Qu Wenruo wrote:
>>
>>
>> On 2018年06月14日 14:45, Nikolay Borisov wrote:
>>>
>>>
>>> On 14.06.2018 09:30, Qu Wenruo wrote:
>>>> This is a long existing bug (from 2012) but exposed by a reporter
>>>> recently, that when compressed extent without data csum get written to
>>>> device-replace target device, the written data is in fact uncompressed data
>>>> other than the original compressed data.
>>>>
>>>> And since btrfs still consider the data is compressed and will try to read it
>>>> as compressed, it can cause read error.
>>>>
>>>> The root cause is located, and fix already sent.
>>>> "btrfs: scrub: Don't use inode pages for device replace".
>>>>
>>>> Reported-by: James Harvey <jamespharvey20@xxxxxxxxx>
>>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>>
>>> Codewise lgtm:
>>>
>>> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx>
>>>
>>> However, I'm having a bit of trouble reconciling the explanation so bear
>>> with me, please look below.
>>>
>>>> ----
>>>> changelog:
>>>> v2:
>>>>   Now the fix patch is no longer RFC.
>>>>   Remove _require_test as we don't really touch it.
>>>>   Add comment on the mount cycle.
>>>>   Add the test to group 'volume'.
>>>> v3:
>>>>   Use latest template.
>>>>   Rebased to latest upstream base.
>>>> ---
>>>>  tests/btrfs/165     | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  tests/btrfs/165.out |  2 ++
>>>>  tests/btrfs/group   |  1 +
>>>>  3 files changed, 79 insertions(+)
>>>>  create mode 100755 tests/btrfs/165
>>>>  create mode 100644 tests/btrfs/165.out
>>>>
>>>> diff --git a/tests/btrfs/165 b/tests/btrfs/165
>>>> new file mode 100755
>>>> index 000000000000..eb9bb61c9ea3
>>>> --- /dev/null
>>>> +++ b/tests/btrfs/165
>>>> @@ -0,0 +1,76 @@
>>>> +#! /bin/bash
>>>> +# SPDX-License-Identifier: GPL-2.0
>>>> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
>>>> +#
>>>> +# FS QA Test 165
>>>> +#
>>>> +# Test if btrfs will corrupt compressed data extent without data csum
>>>> +# by replacing it with uncompressed data, when doing device replace.
>>>> +#
>>>> +# This could be fixed by the following patch:
>>>> +# "btrfs: scrub: Don't use inode pages for device replace"
>>>> +#
>>>> +seq=`basename $0`
>>>> +seqres=$RESULT_DIR/$seq
>>>> +echo "QA output created by $seq"
>>>> +
>>>> +here=`pwd`
>>>> +tmp=/tmp/$$
>>>> +status=1	# failure is the default!
>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>> +
>>>> +_cleanup()
>>>> +{
>>>> +	cd /
>>>> +	rm -f $tmp.*
>>>> +}
>>>> +
>>>> +# get standard environment, filters and checks
>>>> +. ./common/rc
>>>> +. ./common/filter
>>>> +
>>>> +# remove previous $seqres.full before test
>>>> +rm -f $seqres.full
>>>> +
>>>> +# real QA test starts here
>>>> +
>>>> +# Modify as appropriate.
>>>> +_supported_fs btrfs
>>>> +_supported_os Linux
>>>> +_require_scratch_dev_pool 2
>>>> +_require_scratch_dev_pool_equal_size
>>>> +
>>>> +_scratch_dev_pool_get 1
>>>> +_spare_dev_get
>>>> +_scratch_pool_mkfs >> $seqres.full 2>&1
>>>> +
>>>> +# Create nodatasum inode
>>>> +_scratch_mount "-o nodatasum"
>>>> +touch $SCRATCH_MNT/nodatasum_file
>>>
>>> So we create an inode with nodatasum
>>>
>>>> +_scratch_remount "datasum,compress"
>>>> +_pwrite_byte 0xcd 0 128K $SCRATCH_MNT/nodatasum_file > /dev/null
>>>
>>> Then we remount the fs with datasum and compress. Now, what should the
>>> behavior of that particular inode be. IMO all rights to it should be
>>> nodatasum and shouldn't really be compressed (because the inode retained
>>> it's original nodatasum profile).
>>
>> Yes, that should be the ultimate fix.
>>
>> So here we have 2 fixes:
>> 1) Just fix the inode page error
>>    The fix submitted upstream
>>
>> 2) Fix the nodatasum/nodatacow/compression mess
>>    I submitted similar fix some time ago.
>>    (that inode_need_compress() patch)
>>
>> The 2) one has one problem, it won't save the already generated
>> compressed nodatasum extent.
>>
>> So 1) is still needed anyway, if only 2) is applied, a replace can still
>> corrupt affected extents.
>> (And 1) can be super small to get into current pull request)
>>
>>>
>>>> +
>>>> +# Write the compressed data back to disk
>>>> +sync
>>>> +
>>>> +# Replace the device
>>>> +_run_btrfs_util_prog replace start -Bf 1 $SPARE_DEV $SCRATCH_MNT
>>>> +
>>>> +# Unmount to drop all cache so next read will read from disk
>>>> +_scratch_unmount
>>>> +_mount $SPARE_DEV $SCRATCH_MNT
>>>> +
>>>> +# Now the EXTENT_DATA item still marks the extent as compressed,
>>>
>>> I think the actual bug is that EXTENT_DATA item shouldn't be marked as
>>> compressed in the first place because the indoe is in nodatasum mode, no ?
>>
>> Yes, but also explained above, the damage is already done, and even we
>> fix that way, the affected users/inodes can still hit the corruption bug
>> anyway.
> 
> Fair enough, but in this case there are really 2 fixes. One which fixes
> the root of the issue - which still hasn't landed.

That needs another test case, not this one.
(Although any of the fixed mentioned could make this test pass).

For the root compression vs nodatasum test, the test case should detect
the fiemap or EXTENT_DATA flags directly, without the need to replace
device or read the data.

> And it's changelog
> should be along the lines of "ensure we don't compress inodes which have
> been in nodatasum mode".

It would affect this test case, but not as a direct test case as
checking the EXTENT_DATA flag.

> And a second one, which fixes just the symptom
> for already corrupted extents - this is the if (0...) commit which was
> in latest pull req, correct?

At least currently, this test case is the most direct way to verify the
upstream fix.

So personally speaking, 2 fixes 2 test cases.
And for the upstream fix, this test case is the most relative one, thus
the comment in this test case still makes sense here.

Thanks,
Qu

> 
>>
>> So the upstream fix is more or less a reminder of what we did wrong in
>> the far past.
>>
>> Thanks,
>> Qu
>>
>>>
>>> All things considered, isn't the actual bug that an extent item is
>>> erroneously flagged as compressed when it shouldn't have been due to the
>>> presence of nodatasum?
>>>
>>>> +# but the on-disk data is uncompressed, thus reading it as compressed
>>>> +# will definitely cause EIO.
>>>> +cat $SCRATCH_MNT/nodatasum_file > /dev/null
>>>> +
>>>> +_scratch_unmount
>>>> +_spare_dev_put
>>>> +_scratch_dev_pool_put
>>>> +
>>>> +echo "Silence is golden"
>>>> +
>>>> +# success, all done
>>>> +status=0
>>>> +exit
>>>> diff --git a/tests/btrfs/165.out b/tests/btrfs/165.out
>>>> new file mode 100644
>>>> index 000000000000..94ec17dc1075
>>>> --- /dev/null
>>>> +++ b/tests/btrfs/165.out
>>>> @@ -0,0 +1,2 @@
>>>> +QA output created by 165
>>>> +Silence is golden
>>>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>>>> index 35354de2ea6f..91a1ebadae7c 100644
>>>> --- a/tests/btrfs/group
>>>> +++ b/tests/btrfs/group
>>>> @@ -167,3 +167,4 @@
>>>>  162 auto quick volume
>>>>  163 auto quick volume
>>>>  164 auto quick volume
>>>> +165 auto quick replace volume
>>>>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html