Re: [PATCH v3] fstests: btrfs: Test if btrfs will corrupt nodatasum compressed extent when replacing device

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]




On 14.06.2018 10:04, Qu Wenruo wrote:
> 
> 
> On 2018年06月14日 14:45, Nikolay Borisov wrote:
>>
>>
>> On 14.06.2018 09:30, Qu Wenruo wrote:
>>> This is a long existing bug (from 2012) but exposed by a reporter
>>> recently, that when compressed extent without data csum get written to
>>> device-replace target device, the written data is in fact uncompressed data
>>> other than the original compressed data.
>>>
>>> And since btrfs still consider the data is compressed and will try to read it
>>> as compressed, it can cause read error.
>>>
>>> The root cause is located, and fix already sent.
>>> "btrfs: scrub: Don't use inode pages for device replace".
>>>
>>> Reported-by: James Harvey <jamespharvey20@xxxxxxxxx>
>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>
>> Codewise lgtm:
>>
>> Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx>
>>
>> However, I'm having a bit of trouble reconciling the explanation so bear
>> with me, please look below.
>>
>>> ----
>>> changelog:
>>> v2:
>>>   Now the fix patch is no longer RFC.
>>>   Remove _require_test as we don't really touch it.
>>>   Add comment on the mount cycle.
>>>   Add the test to group 'volume'.
>>> v3:
>>>   Use latest template.
>>>   Rebased to latest upstream base.
>>> ---
>>>  tests/btrfs/165     | 76 +++++++++++++++++++++++++++++++++++++++++++++
>>>  tests/btrfs/165.out |  2 ++
>>>  tests/btrfs/group   |  1 +
>>>  3 files changed, 79 insertions(+)
>>>  create mode 100755 tests/btrfs/165
>>>  create mode 100644 tests/btrfs/165.out
>>>
>>> diff --git a/tests/btrfs/165 b/tests/btrfs/165
>>> new file mode 100755
>>> index 000000000000..eb9bb61c9ea3
>>> --- /dev/null
>>> +++ b/tests/btrfs/165
>>> @@ -0,0 +1,76 @@
>>> +#! /bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
>>> +#
>>> +# FS QA Test 165
>>> +#
>>> +# Test if btrfs will corrupt compressed data extent without data csum
>>> +# by replacing it with uncompressed data, when doing device replace.
>>> +#
>>> +# This could be fixed by the following patch:
>>> +# "btrfs: scrub: Don't use inode pages for device replace"
>>> +#
>>> +seq=`basename $0`
>>> +seqres=$RESULT_DIR/$seq
>>> +echo "QA output created by $seq"
>>> +
>>> +here=`pwd`
>>> +tmp=/tmp/$$
>>> +status=1	# failure is the default!
>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>> +
>>> +_cleanup()
>>> +{
>>> +	cd /
>>> +	rm -f $tmp.*
>>> +}
>>> +
>>> +# get standard environment, filters and checks
>>> +. ./common/rc
>>> +. ./common/filter
>>> +
>>> +# remove previous $seqres.full before test
>>> +rm -f $seqres.full
>>> +
>>> +# real QA test starts here
>>> +
>>> +# Modify as appropriate.
>>> +_supported_fs btrfs
>>> +_supported_os Linux
>>> +_require_scratch_dev_pool 2
>>> +_require_scratch_dev_pool_equal_size
>>> +
>>> +_scratch_dev_pool_get 1
>>> +_spare_dev_get
>>> +_scratch_pool_mkfs >> $seqres.full 2>&1
>>> +
>>> +# Create nodatasum inode
>>> +_scratch_mount "-o nodatasum"
>>> +touch $SCRATCH_MNT/nodatasum_file
>>
>> So we create an inode with nodatasum
>>
>>> +_scratch_remount "datasum,compress"
>>> +_pwrite_byte 0xcd 0 128K $SCRATCH_MNT/nodatasum_file > /dev/null
>>
>> Then we remount the fs with datasum and compress. Now, what should the
>> behavior of that particular inode be. IMO all rights to it should be
>> nodatasum and shouldn't really be compressed (because the inode retained
>> it's original nodatasum profile).
> 
> Yes, that should be the ultimate fix.
> 
> So here we have 2 fixes:
> 1) Just fix the inode page error
>    The fix submitted upstream
> 
> 2) Fix the nodatasum/nodatacow/compression mess
>    I submitted similar fix some time ago.
>    (that inode_need_compress() patch)
> 
> The 2) one has one problem, it won't save the already generated
> compressed nodatasum extent.
> 
> So 1) is still needed anyway, if only 2) is applied, a replace can still
> corrupt affected extents.
> (And 1) can be super small to get into current pull request)
> 
>>
>>> +
>>> +# Write the compressed data back to disk
>>> +sync
>>> +
>>> +# Replace the device
>>> +_run_btrfs_util_prog replace start -Bf 1 $SPARE_DEV $SCRATCH_MNT
>>> +
>>> +# Unmount to drop all cache so next read will read from disk
>>> +_scratch_unmount
>>> +_mount $SPARE_DEV $SCRATCH_MNT
>>> +
>>> +# Now the EXTENT_DATA item still marks the extent as compressed,
>>
>> I think the actual bug is that EXTENT_DATA item shouldn't be marked as
>> compressed in the first place because the indoe is in nodatasum mode, no ?
> 
> Yes, but also explained above, the damage is already done, and even we
> fix that way, the affected users/inodes can still hit the corruption bug
> anyway.

Fair enough, but in this case there are really 2 fixes. One which fixes
the root of the issue - which still hasn't landed. And it's changelog
should be along the lines of "ensure we don't compress inodes which have
been in nodatasum mode". And a second one, which fixes just the symptom
for already corrupted extents - this is the if (0...) commit which was
in latest pull req, correct?

> 
> So the upstream fix is more or less a reminder of what we did wrong in
> the far past.
> 
> Thanks,
> Qu
> 
>>
>> All things considered, isn't the actual bug that an extent item is
>> erroneously flagged as compressed when it shouldn't have been due to the
>> presence of nodatasum?
>>
>>> +# but the on-disk data is uncompressed, thus reading it as compressed
>>> +# will definitely cause EIO.
>>> +cat $SCRATCH_MNT/nodatasum_file > /dev/null
>>> +
>>> +_scratch_unmount
>>> +_spare_dev_put
>>> +_scratch_dev_pool_put
>>> +
>>> +echo "Silence is golden"
>>> +
>>> +# success, all done
>>> +status=0
>>> +exit
>>> diff --git a/tests/btrfs/165.out b/tests/btrfs/165.out
>>> new file mode 100644
>>> index 000000000000..94ec17dc1075
>>> --- /dev/null
>>> +++ b/tests/btrfs/165.out
>>> @@ -0,0 +1,2 @@
>>> +QA output created by 165
>>> +Silence is golden
>>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>>> index 35354de2ea6f..91a1ebadae7c 100644
>>> --- a/tests/btrfs/group
>>> +++ b/tests/btrfs/group
>>> @@ -167,3 +167,4 @@
>>>  162 auto quick volume
>>>  163 auto quick volume
>>>  164 auto quick volume
>>> +165 auto quick replace volume
>>>
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux