Re: [PATCH v3] fstests: btrfs: Test if btrfs will corrupt nodatasum compressed extent when replacing device

Qu Wenruo <wqu@xxxxxxx> · Thu, 14 Jun 2018 15:04:43 +0800

On 2018年06月14日 14:45, Nikolay Borisov wrote:
> 
> 
> On 14.06.2018 09:30, Qu Wenruo wrote:
>> This is a long existing bug (from 2012) but exposed by a reporter
>> recently, that when compressed extent without data csum get written to
>> device-replace target device, the written data is in fact uncompressed data
>> other than the original compressed data.
>>
>> And since btrfs still consider the data is compressed and will try to read it
>> as compressed, it can cause read error.
>>
>> The root cause is located, and fix already sent.
>> "btrfs: scrub: Don't use inode pages for device replace".
>>
>> Reported-by: James Harvey <jamespharvey20@xxxxxxxxx>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> 
> Codewise lgtm:
> 
> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx>
> 
> However, I'm having a bit of trouble reconciling the explanation so bear
> with me, please look below.
> 
>> ----
>> changelog:
>> v2:
>>   Now the fix patch is no longer RFC.
>>   Remove _require_test as we don't really touch it.
>>   Add comment on the mount cycle.
>>   Add the test to group 'volume'.
>> v3:
>>   Use latest template.
>>   Rebased to latest upstream base.
>> ---
>>  tests/btrfs/165     | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>  tests/btrfs/165.out |  2 ++
>>  tests/btrfs/group   |  1 +
>>  3 files changed, 79 insertions(+)
>>  create mode 100755 tests/btrfs/165
>>  create mode 100644 tests/btrfs/165.out
>>
>> diff --git a/tests/btrfs/165 b/tests/btrfs/165
>> new file mode 100755
>> index 000000000000..eb9bb61c9ea3
>> --- /dev/null
>> +++ b/tests/btrfs/165
>> @@ -0,0 +1,76 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 165
>> +#
>> +# Test if btrfs will corrupt compressed data extent without data csum
>> +# by replacing it with uncompressed data, when doing device replace.
>> +#
>> +# This could be fixed by the following patch:
>> +# "btrfs: scrub: Don't use inode pages for device replace"
>> +#
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +
>> +here=`pwd`
>> +tmp=/tmp/$$
>> +status=1	# failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> +	cd /
>> +	rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +
>> +# remove previous $seqres.full before test
>> +rm -f $seqres.full
>> +
>> +# real QA test starts here
>> +
>> +# Modify as appropriate.
>> +_supported_fs btrfs
>> +_supported_os Linux
>> +_require_scratch_dev_pool 2
>> +_require_scratch_dev_pool_equal_size
>> +
>> +_scratch_dev_pool_get 1
>> +_spare_dev_get
>> +_scratch_pool_mkfs >> $seqres.full 2>&1
>> +
>> +# Create nodatasum inode
>> +_scratch_mount "-o nodatasum"
>> +touch $SCRATCH_MNT/nodatasum_file
> 
> So we create an inode with nodatasum
> 
>> +_scratch_remount "datasum,compress"
>> +_pwrite_byte 0xcd 0 128K $SCRATCH_MNT/nodatasum_file > /dev/null
> 
> Then we remount the fs with datasum and compress. Now, what should the
> behavior of that particular inode be. IMO all rights to it should be
> nodatasum and shouldn't really be compressed (because the inode retained
> it's original nodatasum profile).

Yes, that should be the ultimate fix.

So here we have 2 fixes:
1) Just fix the inode page error
   The fix submitted upstream

2) Fix the nodatasum/nodatacow/compression mess
   I submitted similar fix some time ago.
   (that inode_need_compress() patch)

The 2) one has one problem, it won't save the already generated
compressed nodatasum extent.

So 1) is still needed anyway, if only 2) is applied, a replace can still
corrupt affected extents.
(And 1) can be super small to get into current pull request)

> 
>> +
>> +# Write the compressed data back to disk
>> +sync
>> +
>> +# Replace the device
>> +_run_btrfs_util_prog replace start -Bf 1 $SPARE_DEV $SCRATCH_MNT
>> +
>> +# Unmount to drop all cache so next read will read from disk
>> +_scratch_unmount
>> +_mount $SPARE_DEV $SCRATCH_MNT
>> +
>> +# Now the EXTENT_DATA item still marks the extent as compressed,
> 
> I think the actual bug is that EXTENT_DATA item shouldn't be marked as
> compressed in the first place because the indoe is in nodatasum mode, no ?

Yes, but also explained above, the damage is already done, and even we
fix that way, the affected users/inodes can still hit the corruption bug
anyway.

So the upstream fix is more or less a reminder of what we did wrong in
the far past.

Thanks,
Qu

> 
> All things considered, isn't the actual bug that an extent item is
> erroneously flagged as compressed when it shouldn't have been due to the
> presence of nodatasum?
> 
>> +# but the on-disk data is uncompressed, thus reading it as compressed
>> +# will definitely cause EIO.
>> +cat $SCRATCH_MNT/nodatasum_file > /dev/null
>> +
>> +_scratch_unmount
>> +_spare_dev_put
>> +_scratch_dev_pool_put
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/165.out b/tests/btrfs/165.out
>> new file mode 100644
>> index 000000000000..94ec17dc1075
>> --- /dev/null
>> +++ b/tests/btrfs/165.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 165
>> +Silence is golden
>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>> index 35354de2ea6f..91a1ebadae7c 100644
>> --- a/tests/btrfs/group
>> +++ b/tests/btrfs/group
>> @@ -167,3 +167,4 @@
>>  162 auto quick volume
>>  163 auto quick volume
>>  164 auto quick volume
>> +165 auto quick replace volume
>>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html