On Fri, Sep 20, 2019 at 05:18:00PM +0800, kaixuxia wrote: > > On 2019/9/19 20:26, Brian Foster wrote: > > On Thu, Sep 19, 2019 at 08:14:10PM +0800, kaixuxia wrote: > >> > >> > >> On 2019/9/19 18:47, Brian Foster wrote: > >>> On Thu, Sep 19, 2019 at 05:08:04PM +0800, kaixuxia wrote: > >>>> > >>>> > >>>> On 2019/9/18 21:59, Brian Foster wrote: > >>>>> On Wed, Sep 18, 2019 at 07:49:22PM +0800, kaixuxia wrote: > >>>>>> There is ABBA deadlock bug between the AGI and AGF when performing > >>>>>> rename() with RENAME_WHITEOUT flag, and add this testcase to make > >>>>>> sure the rename() call works well. > >>>>>> > >>>>>> Signed-off-by: kaixuxia <kaixuxia@xxxxxxxxxxx> > >>>>>> --- > >>>>> > >>>>> FYI, for some reason your patch series isn't threaded on the mailing > >>>>> list. I thought git send-email did this by default. Assuming you're not > >>>>> explicitly using --no-thread, you might have to use the --thread option > >>>>> so this gets posted as a proper series. > >>>>> > >>>> Yeah, thanks! > >>>>>> tests/xfs/512 | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>> tests/xfs/512.out | 2 ++ > >>>>>> tests/xfs/group | 1 + > >>>>>> 3 files changed, 99 insertions(+) > >>>>>> create mode 100755 tests/xfs/512 > >>>>>> create mode 100644 tests/xfs/512.out > >>>>>> > >>>>>> diff --git a/tests/xfs/512 b/tests/xfs/512 > >>>>>> new file mode 100755 > >>>>>> index 0000000..a2089f0 > >>>>>> --- /dev/null > >>>>>> +++ b/tests/xfs/512 > >>>>>> @@ -0,0 +1,96 @@ > >>>>>> +#! /bin/bash > >>>>>> +# SPDX-License-Identifier: GPL-2.0 > >>>>>> +# Copyright (c) 2019 Tencent. All Rights Reserved. > >>>>>> +# > >>>>>> +# FS QA Test 512 > >>>>>> +# > >>>>>> +# Test the ABBA deadlock case between the AGI and AGF When performing > >>>>>> +# rename operation with RENAME_WHITEOUT flag. > >>>>>> +# > >>>>>> +seq=`basename $0` > >>>>>> +seqres=$RESULT_DIR/$seq > >>>>>> +echo "QA output created by $seq" > >>>>>> + > >>>>>> +here=`pwd` > >>>>>> +tmp=/tmp/$$ > >>>>>> +status=1 # failure is the default! > >>>>>> +trap "_cleanup; exit \$status" 0 1 2 3 15 > >>>>>> + > >>>>>> +_cleanup() > >>>>>> +{ > >>>>>> + cd / > >>>>>> + rm -f $tmp.* > >>>>>> +} > >>>>>> + > >>>>>> +# get standard environment, filters and checks > >>>>>> +. ./common/rc > >>>>>> +. ./common/filter > >>>>>> +. ./common/renameat2 > >>>>>> + > >>>>>> +rm -f $seqres.full > >>>>>> + > >>>>>> +# real QA test starts here > >>>>>> +_supported_fs xfs > >>>>>> +_supported_os Linux > >>>>>> +# single AG will cause default xfs_repair to fail. This test need a > >>>>>> +# single AG fs, so ignore the check. > >>>>>> +_require_scratch_nocheck > >>>>>> +_requires_renameat2 whiteout > >>>>>> + > >>>>>> +filter_enospc() { > >>>>>> + sed -e '/^.*No space left on device.*/d' > >>>>>> +} > >>>>>> + > >>>>>> +create_file() > >>>>>> +{ > >>>>>> + local target_dir=$1 > >>>>>> + local files_count=$2 > >>>>>> + > >>>>>> + for i in $(seq 1 $files_count); do > >>>>>> + echo > $target_dir/f$i >/dev/null 2>&1 | filter_enospc > >>>>>> + done > >>>>>> +} > >>>>>> + > >>>>>> +rename_whiteout() > >>>>>> +{ > >>>>>> + local target_dir=$1 > >>>>>> + local files_count=$2 > >>>>>> + > >>>>>> + # a long filename could increase the possibility that target_dp > >>>>>> + # allocate new blocks(acquire the AGF lock) to store the filename > >>>>>> + longnamepre=`$PERL_PROG -e 'print "a"x200;'` > >>>>>> + > >>>>>> + # now try to do rename with RENAME_WHITEOUT flag > >>>>>> + for i in $(seq 1 $files_count); do > >>>>>> + src/renameat2 -w $SCRATCH_MNT/f$i $target_dir/$longnamepre$i >/dev/null 2>&1 > >>>>>> + done > >>>>>> +} > >>>>>> + > >>>>>> +_scratch_mkfs_xfs -d agcount=1 >> $seqres.full 2>&1 || > >>>>>> + _fail "mkfs failed" > >>>>> > >>>>> This appears to be the only XFS specific bit. Could it be > >>>>> conditionalized using FSTYP such that this test could go under > >>>>> tests/generic? > >>>>> > >>>> OK, I'll move this test to tests/generic by using FSTYP. > >>>> > >>>>>> +_scratch_mount > >>>>>> + > >>>>>> +# set the rename and create file counts > >>>>>> +file_count=50000 > >>>>>> + > >>>>>> +# create the necessary directory for create and rename operations > >>>>>> +createdir=$SCRATCH_MNT/createdir > >>>>>> +mkdir $createdir > >>>>>> +renamedir=$SCRATCH_MNT/renamedir > >>>>>> +mkdir $renamedir > >>>>>> + > >>>>>> +# create many small files for the rename with RENAME_WHITEOUT > >>>>>> +create_file $SCRATCH_MNT $file_count > >>>>>> + > >>>>>> +# try to create files at the same time to hit the deadlock > >>>>>> +rename_whiteout $renamedir $file_count & > >>>>>> +create_file $createdir $file_count & > >>>>>> + > >>>>> > >>>>> When I ran this test I noticed that the rename_whiteout task completed > >>>>> renaming the 50k files before the create_file task created even 30k of > >>>>> the 50k files. There's no risk of deadlock once one of these tasks > >>>>> completes, right? If so, that seems like something that could be fixed > >>>>> up. > >>>>> > >>>>> Beyond that though, the test itself ran for almost 19 minutes on a vm > >>>>> with the deadlock fix. That seems like overkill to me for a test that's > >>>>> so narrowly focused on a particular bug that it's unlikely to fail in > >>>>> the future. If we can't find a way to get this down to a reasonable time > >>>>> while still reproducing the deadlock, I'm kind of wondering if there's a > >>>>> better approach to get more rename coverage from existing tests. For > >>>>> example, could we add this support to fsstress and see if any of the > >>>>> existing stress tests might trigger the original problem? Even if we > >>>>> needed to add a new rename/create focused fsstress test, that might at > >>>>> least be general enough to provide broader coverage. > >>>>> > >>>> Yeah, rename_whiteout task run faster than create_file task, so maybe > >>>> we can set two different files counts for them to reduce the test run > >>>> time. This test ran for 380s on my vm with the fixed kernel, but we > >>>> still need to find a way to reduce the run time, like the 19 minutes > >>>> case. Actually, in most cases, the deadlock happened when the > >>>> rename_whiteout task completed renaming hundreds of files. 50000 > >>>> is set just because this test take 380s on my vm which is acceptable > >>>> and the reproduce possibility is near 100%. So maybe we can choose a > >>>> proper files count to make the test runs faster. Of course, I'll > >>>> also try to use fsstresss and the TIME_FACTOR if they can help to > >>>> reduce the run time. > >>>> > >>> > >>> I think using different file counts as such is too unpredictable across > >>> different test environments. If we end up with something like the > >>> current test, I'd rather see explicit logic in the test to terminate the > >>> workload thread when the rename thread completes. This probably would > >>> have knocked 2-3 minutes off the slow runtime I reported above. > >>> > >>> That aside, I think the fsstress approach is preferable because there is > >>> at least potential to avoid the need for a new test. The relevant > >>> questions to me are: > >>> > >>> 1.) If you add renameat2 support to fsstress, do any of the existing > >>> fsstress tests reproduce the original problem? > >> > >> Not sure about this, need to do research whether there are existing > >> fsstress tests can reproduce the problem. > > > > Right, but this will require some work to fsstress to add renameat2 > > support. To Eryu's earlier point, however, that is probably a useful > > patch regardless of what approach we take below. > > Yeah, if taking the fsstress approach to reproduce the deadlock we need > to add renameat2 support for fsstress. Given that the deadlock is a corner > case and need some special settings for higher reproduce possibility, such > as the smaller bsize and agcount, the longer rename target filename, and all > the rename call need have the same target_dp to acquire the AGF lock > (xfs_dir_createname()), so we may need a separate test with using customized > parameters(limited to rename(whiteout) and creates). If not, the possibility > would be lower and also need longer run time. > > > > >>> > >>> 2.) If not, can fsstress reproduce the problem using customized > >>> parameters (i.e., limited to rename and creates)? If so, we may still > >>> need a separate test, but it would be trivial in that it just invokes > >>> fsstress with particular flags for a period of time. > >>> > >>> 3.) If not, then we need to find a way for this test to run quicker. At > >>> this point, I'm curious how long it takes for this test to reproduce the > >>> problem (on a broken kernel) on average once the setup portion > >>> completes. More than a minute or two, for example, or tens of minutes.. > >>> etc.? > >>> > >> About five minutes with 50000 files count on a broken kernel to reproduce > >> the deadlock on my vm, and the most time is preparing 50000 empty files for > >> the rename operation. > >> > > > > Ok, so how much time is that outside of the file creation part? You > > could add some timestamps to the test to figure that out if necessary. > > How many files were renamed before the deadlock occurred? > > > About one or two minutes that outside of the file creation on my vm. > The number of renamed files before the deadlock occurred is not fixed, > the most common range is from 500 to 10000 files. Of course, this range > maybe change on different test environments... > Ok. So FWIW a 10k file count and immediate exit after the renameat task completes is a less noticeable 3m45s on my low end vm. I'd still prefer to see an fsstress test if possible, but that might be a reasonable fallback option. Brian > > Brian > > > >> A example for deadlock happened when renaming 2729 files. > >> call trace: > >> root 31829 ... D+ ... /renamedir/aaaaaaaaaaaaaaaaaaa...aaaaaaaaaaaaaaa2729 > >> # cat /proc/31829/stack > >> [<0>] xfs_buf_lock+0x34/0xf0 [xfs] > >> [<0>] xfs_buf_find+0x215/0x6c0 [xfs] > >> [<0>] xfs_buf_get_map+0x37/0x230 [xfs] > >> [<0>] xfs_buf_read_map+0x29/0x190 [xfs] > >> [<0>] xfs_trans_read_buf_map+0x13d/0x520 [xfs] > >> [<0>] xfs_read_agi+0xa8/0x160 [xfs] > >> [<0>] xfs_iunlink_remove+0x6f/0x2a0 [xfs] > >> [<0>] xfs_rename+0x57a/0xae0 [xfs] > >> [<0>] xfs_vn_rename+0xe4/0x150 [xfs] > >> [<0>] vfs_rename+0x1f4/0x7b0 > >> [<0>] do_renameat2+0x431/0x4c0 > >> [<0>] __x64_sys_renameat2+0x20/0x30 > >> [<0>] do_syscall_64+0x49/0x120 > >> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> > >>> Brian > >>> > >>>>> Alternatively, what if this test ran a create/rename workload (on a > >>>>> smaller fileset) for a fixed time of a minute or two and then exited? I > >>>>> think it would be a reasonable compromise if the test still reproduced > >>>>> on some smaller frequency, it's just not clear to me how effective such > >>>>> a test would be without actually trying it. Maybe Eryu has additional > >>>>> thoughts.. > >>>>> > >>>>> Brian > >>>>> > >>>>>> +wait > >>>>>> +echo Silence is golden > >>>>>> + > >>>>>> +# Failure comes in the form of a deadlock. > >>>>>> + > >>>>>> +# success, all done > >>>>>> +status=0 > >>>>>> +exit > >>>>>> diff --git a/tests/xfs/512.out b/tests/xfs/512.out > >>>>>> new file mode 100644 > >>>>>> index 0000000..0aabdef > >>>>>> --- /dev/null > >>>>>> +++ b/tests/xfs/512.out > >>>>>> @@ -0,0 +1,2 @@ > >>>>>> +QA output created by 512 > >>>>>> +Silence is golden > >>>>>> diff --git a/tests/xfs/group b/tests/xfs/group > >>>>>> index a7ad300..ed250d6 100644 > >>>>>> --- a/tests/xfs/group > >>>>>> +++ b/tests/xfs/group > >>>>>> @@ -509,3 +509,4 @@ > >>>>>> 509 auto ioctl > >>>>>> 510 auto ioctl quick > >>>>>> 511 auto quick quota > >>>>>> +512 auto rename > >>>>>> -- > >>>>>> 1.8.3.1 > >>>>>> > >>>>>> -- > >>>>>> kaixuxia > >>>> > >>>> -- > >>>> kaixuxia > >> > >> -- > >> kaixuxia > > -- > kaixuxia