On Wed, Sep 30, 2020 at 5:14 AM Anand Jain <anand.jain@xxxxxxxxxx> wrote: > > On 30/9/20 1:26 am, Josef Bacik wrote: > > On 9/29/20 12:13 PM, Filipe Manana wrote: > >> On Tue, Sep 29, 2020 at 5:02 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > >>> > >>> On 9/29/20 11:55 AM, Filipe Manana wrote: > >>>> On Tue, Sep 29, 2020 at 4:50 PM Anand Jain <anand.jain@xxxxxxxxxx> > >>>> wrote: > >>>>> > >>>>> btrfs/064 aimed to test balance and replace concurrency while the > >>>>> stress > >>>>> test is running in the background. > >>>>> > >>>>> However, as the balance and the replace operation are mutually > >>>>> exclusive, so they can never run concurrently. > >>>> > >>>> And it's good to have a test that verifies that attempting to run them > >>>> concurrently doesn't cause any problems, like crashes, memory leaks or > >>>> some sort of filesystem corruption. > >>>> > >>>> For example btrfs/187, which I wrote sometime ago, tests that running > >>>> send, balance and deduplication in parallel doesn't result in crashes, > >>>> since in the past they were allowed to run concurrently. > >>>> > >>>> I see no point in removing the test, it's useful. > >>> > >>> My confusion was around whether this test was actually testing what we > >>> think it should be testing. If this test was meant to make sure that > >>> replace works while we've got load on the fs, then clearly it's not > >>> doing what we think it's doing. > >> > >> Given that neither the test's description nor the changelog mention > >> that it expects device replace and balance to be able to run > >> concurrently, > >> that errors are explicitly ignored and redirected to $seqres.full, and > >> we don't do any sort of validation after device replace operations, it > >> makes it clear to me it's a stress test. > >> > > > > Sure but I spent a while looking at it when it was failing being very > > confused. In my mind my snapshot-stress.sh is a stress test, because > > its meant to run without errors. The changelog and description are > > sufficiently vague enough that it appeared that Eryu meant to write a > > test that actually did a replace and balance at the same time. The test > > clearly isn't doing that, so we need to update the description so it's > > clear that's what's going on. And then I wanted to make sure that we do > > in fact have a test that stresses replace in these scenarios, because I > > want to make sure we actually test replace as well. > > > > Not ripping it out is fine, but updating the description so I'm not > > confused in a couple years when I trip over this again would be nice. > > Thanks, > > > > As of now, we have the following balance concurrency tests. > ----- > 028 balance and unlink fsstress concurrency [1] > 060 balance and subvol ops concurrency with fsstress [2] > 061 balance and scrub concurrency with fsstress [2] > 062 balance and defrag concurrency with fsstress [2] > 063 balance and remount concurrency with fsstress [2] > 064 balance and replace concurrency with fsstress [2] > 177 balance and resize concurrency No, 177 does not test balance and resize concurrency. It tests balance when a swap file exists. And the resize happens (starts and ends) before setting the swap file and before doing the balance. Thanks. > 187 balance, send and dedupe concurrency > 190 balance with qgroup > > [1] > args=`_scale_fsstress_args -z \ > -f write=10 -f unlink=10 \ > -f creat=10 -f fsync=10 \ > -f fsync=10 -n 100000 -p 2 \ > -d $SCRATCH_MNT/stress_dir` > > [2] > args=`_scale_fsstress_args -p 20 -n 100 $FSSTRESS_AVOID -d > $SCRATCH_MNT/stressdir` > ----- > > 064 shall test balance with fsstress in the background. The replace > thread is kept out with the early check of BTRFS_FS_EXCL_OP in the kernel. > I am ok with the 064 headers updated, will send v2. > > > Also, it turns out that this test case helped to find a btrfs-progs bug. > Its patch [1] is sent to the ML. > [1] btrfs-progs: fix return code for failed replace start > > Thanks, Anand > > > > Josef > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”