Re: [PATCH v2 3/3] common/rc: Check call order of _require_dm_target and _require_scratch*

Eryu Guan <guan@xxxxxxx> · Sun, 12 Sep 2021 17:17:14 +0800

On Fri, Sep 10, 2021 at 06:34:05AM +0000, Shinichiro Kawasaki wrote:
> On Sep 10, 2021 / 10:48, Dave Chinner wrote:
> > On Wed, Sep 08, 2021 at 05:37:15PM +0900, Shin'ichiro Kawasaki wrote:
> > > When SCRATCH_DEV is not set and the test case does not call
> > > _require_scratch* before _require_dm_target, _require_block_device
> > > called from _require_dm_target fails to evaluate SCRATCH_DEV and
> > > results in the test case failure. This failure reason is not described
> > > in the error message and it takes some time to catch.
> > 
> > You should quote the actual failure message here so we have some
> > idea of whether the message that was emitted was appropriate or not
> > without having to go know how the test failed...
> 
> Sorry about the lack of the infomration. As you found below, the meesage was
> "Usage: _require_block_device <dev>".
> 
> > 
> > > To catch the failure reason easier, check SCRATCH_DEV in
> > > _require_dm_target. If SCRATCH_DEV is not set, fail the test case
> > > and print message which requests to fix call order of _require_scratch*
> > > and _require_dm_target. This improvement follows what _scratch_shutdown
> > > does for _require_scratch_shutdown.
> > 
> > Also, you don't need to describe the change in the commit message -
> > the patch does that. The first paragraph is all that is needed here
> > as it describes why you want to make the change.
> 
> I see. I will write "why" in the commit message, not "what". (In the past, I
> was advised to write "what" the patch does, but I think this guide is valid
> only when the change is complicated).
> 
> > 
> > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> > > ---
> > >  common/rc | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index dda5da06..cbec8aaa 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -1971,6 +1971,9 @@ _require_dm_target()
> > >  
> > >  	# require SCRATCH_DEV to be a valid block device with sane BLKFLSBUF
> > >  	# behaviour
> > > +	if [ -z "$SCRATCH_DEV" ]; then
> > > +		_fail "_require_dm_target: call _require_scratch* first in test"
> > > +	fi
> > >  	_require_block_device $SCRATCH_DEV
> > >  	_require_sane_bdev_flush $SCRATCH_DEV
> > >  	_require_command "$DMSETUP_PROG" dmsetup
> > 
> > That's a notrun case, not a fail.
> > 
> > Also, we report the error that has occurred, not how to resolve the
> > problem. That's because we might change behaviour in future and now
> > the error message tells people to do something that is
> > wrong/non-existent. As such, I think the premise this change is based
> > on is not really valid - people running fstests are assumed to have
> > a level of knowledge sufficient to trace a failing test and
> > determine what went wrong from the error reported. i.e. the error
> > message should state what the problem was, not describe a potential
> > solution.
> 
> Thank you for the comment. These are the points I missed. At least I was
> able to catch the cause, so the improvement I suggested is not a big
> improvement.
> 
> > 
> > Also, this is not the place to check if SCRATCH_DEV is set. The
> > check for a NULL device should be in _require_block_device(). Oh,
> > wait, it already is:
> > 
> > _require_block_device()
> > {
> > 	if [ -z "$1" ]; then
> > 		echo "Usage: _require_block_device <dev>" 1>&2
> > 		exit 1
> > 	fi
> > ....
> > }
> > 
> > And that's the error message the test emitted that you didn't
> > understand, right?
> 
> Right :)
> 
> > 
> > If so, the change here should really be to _require_block_device().
> > i.e.
> > 
> > 	if [ -z "$1" ]; then
> > 		_notrun "test requires a block device to be specified"
> > 	fi
> > 
> > A quick scan shows a bunch of similar _requires checks that do
> > similar things with poor error messages and 'exit 1' (e.g.
> > _require_local_device()). _requires rules should call _notrun if the
> > test should not run because of incorrect setup, not 'exit 1'.
> 
> Thank you for your thoughts. I walked through _require_* bash functions in
> common/, and listed 20 functions below, which call 'exit 1', _fail, or
> 'return 1' for its argument check failure:
> 
> --- list start ---
> 
> common/rc
> 
>   _require_scratch_size
>   _require_scratch_size_nocheck
>   _require_command *
>   _require_block_device *
>   _require_local_device *
>   _require_zoned_device *
>   _require_non_zoned_device *
>   _require_scratch_ext4_feature
>   _require_xfs_io_command
>   _require_fio
>   _require_batched_discard *
>   _require_chattr
>   _require_fs_sysfs
>   _require_scratch_feature
> 
> common/btrfs
> 
>   _require_btrfs_mkfs_feature
>   _require_btrfs_fs_feature
> 
> common/xfs
> 
>   _require_xfs_db_command
>   _require_xfs_spaceman_command
> 
> common/encrypt
> 
>   _require_encryption_policy_support (checks arguments passed from _require_scratch_encryption)
> 
> common/rnameat2
> 
>   _require_renameat2
> 
> --- list end ---
> 
> Many of the functions above check arguments not for incorrect setup, but for
> call in test cases with invalid arguments. 6 functions of them with * in the
> list check arguments for the incorrect setups, such as DEBUGFS_PROG,
> SCRATCH_DEV or SCRATCH_MNT. So I suggest to modify these functions to improve
> error messages and call "_notrun". What do you think about this?

IMO the _fail calls in above _require* rules are indicating function
usage errors, which are bugs in the test code. While _notrun indicates a
required condition is not met for this test.

Thanks,
Eryu

P.S. I've applied the first two patches, thanks for the fix!

> 
> -- 
> Best Regards,
> Shin'ichiro Kawasaki