On 2018年06月14日 14:45, Nikolay Borisov wrote: > > > On 14.06.2018 09:30, Qu Wenruo wrote: >> This is a long existing bug (from 2012) but exposed by a reporter >> recently, that when compressed extent without data csum get written to >> device-replace target device, the written data is in fact uncompressed data >> other than the original compressed data. >> >> And since btrfs still consider the data is compressed and will try to read it >> as compressed, it can cause read error. >> >> The root cause is located, and fix already sent. >> "btrfs: scrub: Don't use inode pages for device replace". >> >> Reported-by: James Harvey <jamespharvey20@xxxxxxxxx> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > > Codewise lgtm: > > Reviewed-by: Nikolay Borisov <nborisov@xxxxxxxx> > > However, I'm having a bit of trouble reconciling the explanation so bear > with me, please look below. > >> ---- >> changelog: >> v2: >> Now the fix patch is no longer RFC. >> Remove _require_test as we don't really touch it. >> Add comment on the mount cycle. >> Add the test to group 'volume'. >> v3: >> Use latest template. >> Rebased to latest upstream base. >> --- >> tests/btrfs/165 | 76 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/165.out | 2 ++ >> tests/btrfs/group | 1 + >> 3 files changed, 79 insertions(+) >> create mode 100755 tests/btrfs/165 >> create mode 100644 tests/btrfs/165.out >> >> diff --git a/tests/btrfs/165 b/tests/btrfs/165 >> new file mode 100755 >> index 000000000000..eb9bb61c9ea3 >> --- /dev/null >> +++ b/tests/btrfs/165 >> @@ -0,0 +1,76 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. >> +# >> +# FS QA Test 165 >> +# >> +# Test if btrfs will corrupt compressed data extent without data csum >> +# by replacing it with uncompressed data, when doing device replace. >> +# >> +# This could be fixed by the following patch: >> +# "btrfs: scrub: Don't use inode pages for device replace" >> +# >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# real QA test starts here >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch_dev_pool 2 >> +_require_scratch_dev_pool_equal_size >> + >> +_scratch_dev_pool_get 1 >> +_spare_dev_get >> +_scratch_pool_mkfs >> $seqres.full 2>&1 >> + >> +# Create nodatasum inode >> +_scratch_mount "-o nodatasum" >> +touch $SCRATCH_MNT/nodatasum_file > > So we create an inode with nodatasum > >> +_scratch_remount "datasum,compress" >> +_pwrite_byte 0xcd 0 128K $SCRATCH_MNT/nodatasum_file > /dev/null > > Then we remount the fs with datasum and compress. Now, what should the > behavior of that particular inode be. IMO all rights to it should be > nodatasum and shouldn't really be compressed (because the inode retained > it's original nodatasum profile). Yes, that should be the ultimate fix. So here we have 2 fixes: 1) Just fix the inode page error The fix submitted upstream 2) Fix the nodatasum/nodatacow/compression mess I submitted similar fix some time ago. (that inode_need_compress() patch) The 2) one has one problem, it won't save the already generated compressed nodatasum extent. So 1) is still needed anyway, if only 2) is applied, a replace can still corrupt affected extents. (And 1) can be super small to get into current pull request) > >> + >> +# Write the compressed data back to disk >> +sync >> + >> +# Replace the device >> +_run_btrfs_util_prog replace start -Bf 1 $SPARE_DEV $SCRATCH_MNT >> + >> +# Unmount to drop all cache so next read will read from disk >> +_scratch_unmount >> +_mount $SPARE_DEV $SCRATCH_MNT >> + >> +# Now the EXTENT_DATA item still marks the extent as compressed, > > I think the actual bug is that EXTENT_DATA item shouldn't be marked as > compressed in the first place because the indoe is in nodatasum mode, no ? Yes, but also explained above, the damage is already done, and even we fix that way, the affected users/inodes can still hit the corruption bug anyway. So the upstream fix is more or less a reminder of what we did wrong in the far past. Thanks, Qu > > All things considered, isn't the actual bug that an extent item is > erroneously flagged as compressed when it shouldn't have been due to the > presence of nodatasum? > >> +# but the on-disk data is uncompressed, thus reading it as compressed >> +# will definitely cause EIO. >> +cat $SCRATCH_MNT/nodatasum_file > /dev/null >> + >> +_scratch_unmount >> +_spare_dev_put >> +_scratch_dev_pool_put >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/165.out b/tests/btrfs/165.out >> new file mode 100644 >> index 000000000000..94ec17dc1075 >> --- /dev/null >> +++ b/tests/btrfs/165.out >> @@ -0,0 +1,2 @@ >> +QA output created by 165 >> +Silence is golden >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index 35354de2ea6f..91a1ebadae7c 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -167,3 +167,4 @@ >> 162 auto quick volume >> 163 auto quick volume >> 164 auto quick volume >> +165 auto quick replace volume >> -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html