Re: [PATCH v2] xfstests 293: test xfs direct IO nested transaction deadlock

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 18 Jan 2013 11:28:18 +1100

On Thu, Jan 17, 2013 at 03:27:49PM +0800, Eryu Guan wrote:
> Regression test case for commit:
> 
> 437a255 xfs: fix direct IO nested transaction deadlock.
> 
> Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx>

Couple of things:

> +_scratch_mkfs_xfs -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b >/dev/null 2>&1
> +_scratch_mount >/dev/null 2>&1
> +
> +TESTDIR="$SCRATCH_MNT/testdir"
> +mkdir -p $TESTDIR

That's far too similar to TEST_DIR - the mount point for the test
device.  Reading the test makes me wonder why fsstress is beig run
on the test device, while the freezes are being done on the scratch
device.

Other tests use $testdir, which is fine because the difference in
case makes it obvious that it's not the test device mount point. I'd
prefer you cal it something like "STRESS_DIR", though, so it's
obvious what it is being used for...

> +$FSSTRESS_PROG -d $TESTDIR -n 100 -p 1000 $FSSTRESS_AVOID >/dev/null 2>&1 &
> +
> +# Freeze/unfreeze file system randomly
> +echo "Start freeze/unfreeze randomly" | tee -a $seq.full
> +LOOP=10
> +while [ $LOOP -gt 0 ];do
> +	TIMEOUT=`expr $RANDOM % 5`
> +	sleep $TIMEOUT
> +	echo "* Freeze file system after sleeping $TIMEOUT seconds" >>$seq.full
> +	xfs_freeze -f $SCRATCH_MNT
> +	if [ $? -ne 0 ];then
> +		echo " - Error: freeze filesystem failed" | tee -a $seq.full
> +	fi
> +	TIMEOUT=`expr $RANDOM % 3`
> +	sleep $TIMEOUT
> +	echo "* Unfreeze file system after sleeping $TIMEOUT seconds" >>$seq.full
> +	xfs_freeze -u $SCRATCH_MNT
> +	if [ $? -ne 0 ];then
> +		echo " - Error: unfreeze filesystem failed" | tee -a $seq.full
> +	fi
> +	((LOOP--))

Does that sort of decrement construct work on older versions of
bash?

> +done
> +echo "Test done" | tee -a $seq.full
> +killall -q $FSSTRESS_PROG
> +sync

No need for the sync, _check_scratch_fs will do it as part of the
unmount. FWIW you probably need a "wait" call here to wait for all
the fsstress processes to exit otherwise the unmount will fail.

In fact, I bet that's what the sync call does - many of the 1000
fsstress processes will be blocked waiting for queued sys_sync()
operations to complete as the kernel serialises them and fsstress
issues them faster than they can be completed. queuing another sync
will effectively wait for them all to complete and die. I often saw
700-800 of the fsstress processes stuck like this when testing the
fix for the bug this test exposed...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs