Re: [PATCH v3] fstests: btrfs: add test for qgroup handle extent de-reference

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, Jun 14, 2016 at 10:08 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx> wrote:
>
>
> At 06/14/2016 04:41 PM, Filipe Manana wrote:
>>
>> On Tue, Jun 14, 2016 at 1:58 AM, Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
>> wrote:
>>>
>>>
>>>
>>> At 06/13/2016 05:49 PM, Filipe Manana wrote:
>>>>
>>>>
>>>> On Mon, Jun 13, 2016 at 9:06 AM, Lu Fengqi <lufq.fnst@xxxxxxxxxxxxxx>
>>>> wrote:
>>>>>
>>>>>
>>>>> At 06/13/2016 03:29 PM, Lu Fengqi wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> At 06/13/2016 11:04 AM, Eryu Guan wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 13, 2016 at 10:10:50AM +0800, Lu Fengqi wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Test if qgroup can handle extent de-reference during reallocation.
>>>>>>>> "extent de-reference" means that reducing an extent's reference
>>>>>>>> count
>>>>>>>> or freeing an extent.
>>>>>>>> Although current qgroup can handle it, we still need to prevent any
>>>>>>>> regression which may break current qgroup.
>>>>>>>>
>>>>>>>> Signed-off-by: Lu Fengqi <lufq.fnst@xxxxxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>  common/rc           |  4 +--
>>>>>>>>  tests/btrfs/028     | 98
>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  tests/btrfs/028.out |  2 ++
>>>>>>>>  tests/btrfs/group   |  1 +
>>>>>>>>  4 files changed, 103 insertions(+), 2 deletions(-)
>>>>>>>>  create mode 100755 tests/btrfs/028
>>>>>>>>  create mode 100644 tests/btrfs/028.out
>>>>>>>>
>>>>>>>> diff --git a/common/rc b/common/rc
>>>>>>>> index 51092a0..650d198 100644
>>>>>>>> --- a/common/rc
>>>>>>>> +++ b/common/rc
>>>>>>>> @@ -3284,9 +3284,9 @@ _btrfs_get_profile_configs()
>>>>>>>>  # stress btrfs by running balance operation in a loop
>>>>>>>>  _btrfs_stress_balance()
>>>>>>>>  {
>>>>>>>> -    local btrfs_mnt=$1
>>>>>>>> +    local options=$@
>>>>>>>>      while true; do
>>>>>>>> -        $BTRFS_UTIL_PROG balance start $btrfs_mnt
>>>>>>>> +        $BTRFS_UTIL_PROG balance start $options
>>>>>>>>      done
>>>>>>>>  }
>>>>>>>>
>>>>>>>> diff --git a/tests/btrfs/028 b/tests/btrfs/028
>>>>>>>> new file mode 100755
>>>>>>>> index 0000000..8cea49a
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/tests/btrfs/028
>>>>>>>> @@ -0,0 +1,98 @@
>>>>>>>> +#! /bin/bash
>>>>>>>> +# FS QA Test 028
>>>>>>>> +#
>>>>>>>> +# Test if qgroup can handle extent de-reference during
>>>>>>>> reallocation.
>>>>>>>> +# "extent de-reference" means that reducing an extent's reference
>>>>>>>> count
>>>>>>>> +# or freeing an extent.
>>>>>>>> +# Although current qgroup can handle it, we still need to prevent
>>>>>>>> any
>>>>>>>> +# regression which may break current qgroup.
>>>>>>>> +#
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> +#-----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> +# Copyright (c) 2016 Fujitsu. All Rights Reserved.
>>>>>>>> +#
>>>>>>>> +# This program is free software; you can redistribute it and/or
>>>>>>>> +# modify it under the terms of the GNU General Public License as
>>>>>>>> +# published by the Free Software Foundation.
>>>>>>>> +#
>>>>>>>> +# This program is distributed in the hope that it would be useful,
>>>>>>>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>>>> +# GNU General Public License for more details.
>>>>>>>> +#
>>>>>>>> +# You should have received a copy of the GNU General Public License
>>>>>>>> +# along with this program; if not, write the Free Software
>>>>>>>> Foundation,
>>>>>>>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> +#-----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> +#
>>>>>>>> +
>>>>>>>> +seq=`basename $0`
>>>>>>>> +seqres=$RESULT_DIR/$seq
>>>>>>>> +echo "QA output created by $seq"
>>>>>>>> +
>>>>>>>> +here=`pwd`
>>>>>>>> +tmp=/tmp/$$
>>>>>>>> +status=1    # failure is the default!
>>>>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>>>>>>>> +
>>>>>>>> +_cleanup()
>>>>>>>> +{
>>>>>>>> +    cd /
>>>>>>>> +    rm -f $tmp.*
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +# get standard environment, filters and checks
>>>>>>>> +. ./common/rc
>>>>>>>> +. ./common/filter
>>>>>>>> +
>>>>>>>> +# remove previous $seqres.full before test
>>>>>>>> +rm -f $seqres.full
>>>>>>>> +
>>>>>>>> +# real QA test starts here
>>>>>>>> +_supported_fs btrfs
>>>>>>>> +_supported_os Linux
>>>>>>>> +_require_scratch
>>>>>>>> +
>>>>>>>> +_scratch_mkfs
>>>>>>>> +_scratch_mount
>>>>>>>> +
>>>>>>>> +_run_btrfs_util_prog quota enable $SCRATCH_MNT
>>>>>>>> +_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
>>>>>>>> +
>>>>>>>> +# Increase the probability of generating de-refer extent, and
>>>>>>>> decrease
>>>>>>>> +# other.
>>>>>>>> +args=`_scale_fsstress_args -z \
>>>>>>>> +    -f write=10 -f unlink=10 \
>>>>>>>> +    -f creat=10 -f fsync=10 \
>>>>>>>> +    -f fsync=10 -n 100000 -p 2 \
>>>>>>>> +    -d $SCRATCH_MNT/stress_dir`
>>>>>>>> +echo "Run fsstress $args" >>$seqres.full
>>>>>>>> +$FSSTRESS_PROG $args >/dev/null 2>&1 &
>>>>>>>> +fsstress_pid=$!
>>>>>>>> +
>>>>>>>> +echo "Start balance" >>$seqres.full
>>>>>>>> +_btrfs_stress_balance -d $SCRATCH_MNT >/dev/null 2>&1 &
>>>>>>>> +balance_pid=$!
>>>>>>>> +
>>>>>>>> +# 30s is enough to trigger bug
>>>>>>>> +sleep $((30*$TIME_FACTOR))
>>>>>>>> +kill $fsstress_pid $balance_pid
>>>>>>>> +wait
>>>>>>>> +
>>>>>>>> +# kill _btrfs_stress_balance can't end balance, so call btrfs
>>>>>>>> balance cancel
>>>>>>>> +# to cancel running or paused balance.
>>>>>>>> +$BTRFS_UTIL_PROG balance cancel $SCRATCH_MNT &> /dev/null
>>>>>>>> +
>>>>>>>> +_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
>>>>>>>> +
>>>>>>>> +_scratch_unmount
>>>>>>>> +
>>>>>>>> +# generate a qgroup report and look for inconsistent groups
>>>>>>>> +$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV 2>&1 | \
>>>>>>>> +            grep -q -E "Counts for qgroup.*are different"
>>>>>>>> +if [ $? -ne 0 ]; then
>>>>>>>> +    echo "Silence is golden"
>>>>>>>> +    # success, all done
>>>>>>>> +    status=0
>>>>>>>> +fi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm testing with 4.7-rc1 kernel and btrfs-progs v4.4, this test
>>>>>>> fails,
>>>>>>> which means btrfs check finds inconsistent groups. But according to
>>>>>>> your
>>>>>>> commit log, current kernel should pass the test. So is the failure
>>>>>>> expected?
>>>>>>>
>>>>>>> Also, just grep for different qgroup counts and print the message out
>>>>>>> if
>>>>>>> grep finds the message, so it breaks golden image on error and we
>>>>>>> know
>>>>>>> something really goes wrong. Right now test fails just because of
>>>>>>> missing "Silence is golden", which is unclear why it fails:
>>>>>>>
>>>>>>>  @@ -1,2 +1 @@
>>>>>>>   QA output created by 028
>>>>>>>  -Silence is golden
>>>>>>>
>>>>>>> Do the following instead:
>>>>>>>
>>>>>>>     $BTRFS_UTIL_PROG check ... | grep -E "..."
>>>>>>>     echo "Silence is golden"
>>>>>>>     status=0
>>>>>>>     exit
>>>>>>>
>>>>>>> And we see this on failure:
>>>>>>>
>>>>>>>  @@ -1,2 +1,3 @@
>>>>>>>   QA output created by 028
>>>>>>>  +Counts for qgroup id: 5 are different
>>>>>>>   Silence is golden
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Eryu
>>>>>>>
>>>>>>>> +
>>>>>>>> +exit
>>>>>>>> diff --git a/tests/btrfs/028.out b/tests/btrfs/028.out
>>>>>>>> new file mode 100644
>>>>>>>> index 0000000..2615f73
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/tests/btrfs/028.out
>>>>>>>> @@ -0,0 +1,2 @@
>>>>>>>> +QA output created by 028
>>>>>>>> +Silence is golden
>>>>>>>> diff --git a/tests/btrfs/group b/tests/btrfs/group
>>>>>>>> index da0e27f..35ecf59 100644
>>>>>>>> --- a/tests/btrfs/group
>>>>>>>> +++ b/tests/btrfs/group
>>>>>>>> @@ -30,6 +30,7 @@
>>>>>>>>  025 auto quick send clone
>>>>>>>>  026 auto quick compress prealloc
>>>>>>>>  027 auto replace
>>>>>>>> +028 auto qgroup balance
>>>>>>>>  029 auto quick clone
>>>>>>>>  030 auto quick send
>>>>>>>>  031 auto quick subvol clone
>>>>>>>> --
>>>>>>>> 2.5.5
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe fstests"
>>>>>>>> in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Excuse me, I tried several versions of the kernel (4.6-rc7+ / 4.7-rc1
>>>>>> /
>>>>>> 4.7-rc3), the test didn't fail. Can you provide me with more
>>>>>> information
>>>>>> about your test environment?
>>>>>>
>>>>>>
>>>>> Sorry, this test fails after I perform it more times. It seems to be a
>>>>> bug
>>>>> of btrfs qgroup. We will continue to track this bug.
>>>>
>>>>
>>>>
>>>> Which is already known and reported some time ago:
>>>>
>>>> http://thread.gmane.org/gmane.comp.file-systems.btrfs/56687/focus=56824
>>>>
>>>> It's another regression introduced in kernel 4.2 with the large
>>>> qgroups rewrite (data extents are not accounted anymore after
>>>> balance).
>>>> It's trivial to reproduce with:
>>>>
>>>>
>>>> #!/bin/bash -x
>>>>
>>>> MNT="/mnt/sdi"
>>>> DEV="/dev/sdi"
>>>>
>>>> mkfs.btrfs --nodesize=4096 -f $DEV
>>>> mount -t btrfs $DEV $MNT
>>>>
>>>> # fill metadata space, create a tree with 1 node and 2 leaves
>>>> for ((i = 1; i <= 10; i++)); do
>>>>         mkdir $MNT/d$i
>>>> done
>>>> for ((i = 1; i <= 1; i++)); do
>>>>         xfs_io -f -c "pwrite -S 0xaa 0 8K" $MNT/f$i
>>>> done
>>>>
>>>> btrfs fi sync $MNT
>>>> btrfs quota enable $MNT
>>>> btrfs quota rescan -w $MNT
>>>> btrfs qg show $MNT
>>>>
>>>> umount $MNT
>>>> btrfsck $DEV
>>>>
>>>> mount -t btrfs $DEV $MNT
>>>>
>>>> time btrfs balance start -d $MNT
>>>>
>>>> umount $MNT
>>>> btrfsck $DEV
>>>
>>>
>>> Yes, we also found this internally.
>>>
>>> And the direct cause is already located.
>>>
>>> The direct cause is, we're increasing new data extents' reference even
>>> the
>>> referencer(fs tree) is still referring to old extents.
>>>
>>> If using ftrace to trace add/run_delayed_data_ref() and
>>> qgroup_account_extent(), one should find that, new data extents'
>>> reference
>>> is increased with both data reloc tree and final referencer.(fs tree in
>>> the
>>> test script).
>>>
>>> In that timing, backref walk can't find any root referring the new data
>>> extents.
>>> The data reloc tree is the current referencer, but qgroup won't account
>>> data
>>> reloc tree.
>>> While new data extents have fs tree backref, fs tree still refers to the
>>> old
>>> extent. So backref walk won't count it either.
>>>
>>> And the commit_trans before we really merges reloc roots makes the
>>> accounting wrong.
>>>
>>> So it seems to be an existing problem before the qgroup rework.
>>
>>
>> How can it be an existing problem before the qgroup rework if this
>> example (and others more complex involving snapshots and a large
>> number of data extents) worked on 4.1 and previous kernels?
>
>
> Because in old kernels, backref walk is unreliable, especially for such
> tricky tree block swap thing.
>
> As you can see, it's very obvious that, while fs tree is still referring to
> the old extent, it's never possible to get nr_old/new_roots to anything
> other than 0 for new extent.
>
> So for old kernels, either backref walk got wrong(as it just happend), or it
> doesn't go through qgroup accounting at all.

>From a user point of view, the results are incorrect with 4.2+ and
correct with 4.1 and older kernels.
That's all that matters and that's what defines a regression.

>
> Thanks,
> Qu
>
>
>>
>>
>>
>>>
>>> The quick fix won't be hard, just re-dirty the new data extent at
>>> merge_reloc_roots() time.
>>>
>>> But I'm still investigating if we can fix it in a better method, to
>>> increase
>>> the new data extents' reference at correct time.
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>>
>>>>
>>>> Also, please, when you send a new version of a patch, add a changelog
>>>> describing what changed between versions. See [1] for an example.
>>>> I noticed you do this for all patches you send.
>>>>
>>>> [1]
>>>>
>>>> https://btrfs.wiki.kernel.org/index.php/Developer%27s_FAQ#Repeated_submissions
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Lu
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux