Re: [RFC PATCH] check: try to fix the test device if it gets corrupted

Eryu Guan <eguan@xxxxxxxxxx> · Mon, 27 Mar 2017 16:51:03 +0800

Hi Ted,

On Sun, Mar 26, 2017 at 09:48:02PM -0400, Theodore Ts'o wrote:
> On Fri, Mar 03, 2017 at 06:01:29PM -0500, Theodore Ts'o wrote:
> > On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote:
> > > 
> > > <shrug> The test device isn't supposed to get corrupted, since it (at
> > > least in theory) should be an old filesystem.  That said, I suppose
> > > there's little point in banging around with a corrupt test fs.  Maybe we
> > > could go further and stop running if there's unfixable corruption?
> > 
> > Yes, that was the other alternative I considered.  In my case, though,
> > since I'm trying to get a sense of how many failures I have to deal
> > with, I really wanted a "make -k" behavior that would continue after
> > the first failure.  After all, all I was going to do was manually run
> > fsck, and then continue the run --- so we might as well have the check
> > script do it automatically and then allow things to continue.
> > 
> > We could make it be configurable, via a command-line option.  The -k
> > option isn't taken so we could have check -k that works like make -k
> > if you think that's better.  OTOH, perhaps making -k the default
> > behaviour is actually the better way to go, and in that case, maybe
> > it's not worth having the command-line flag?
> 
> Eryu, do you have any preferences or comments about how you'd like me
> to modify this patch for upstreaming?  (Attached is my current version
> of the patch).
> 
> Thanks!!

Sorry I lost this thread, I thought I've replied but apparently I didn't..

I agreed with both of you and Darrick, I think we can try to repair the
corrupted test fs, and if repair succeeds we can continue the test, and
stop running the whole test if repair fails.

> 
> 					- Ted
> 
> commit 727c737d1f0a40288fc897c0263fbf8e7a5db8b3
> Author: Theodore Ts'o <tytso@xxxxxxx>
> Date:   Wed Mar 1 19:54:08 2017 -0500
> 
>     check: try to fix the test device if it gets corrupted
>     
>     If the test device gets corrupted all subsequent tests will fail.  To
>     prevent this from causing all subsequent tests to be useless, try
>     repair the file system on TEST_DEV if possible.  We don't need to do
>     this with the scratch device since that file system gets recreated
>     each time anyway.
>     
>     Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> 
> diff --git a/check b/check
> index 2fcf385f..d253f744 100755
> --- a/check
> +++ b/check
> @@ -472,7 +472,11 @@ _summary()
>  _check_filesystems()
>  {
>  	if [ -f ${RESULT_DIR}/require_test ]; then
> -		_check_test_fs || err=true
> +		if ! _check_test_fs ; then
> +		    err=true
> +		    echo "Trying to repair broken TEST_DEV file system"
> +		    _repair_test_fs

Minor nit, need tab for indention in the if block.

> +		fi
>  		rm -f ${RESULT_DIR}/require_test*
>  	fi
>  	if [ -f ${RESULT_DIR}/require_scratch ]; then
> diff --git a/common/rc b/common/rc
> index 052f67aa..ce491f3f 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1172,6 +1172,28 @@ _repair_scratch_fs()
>      esac
>  }
>  
> +_repair_test_fs()
> +{

Minor nit, this function has mixed tab & space for indention too, use
tab for new functions.

> +    case $FSTYP in
> +    ext2|ext3|ext4)
> +        fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1
> +	if test $? -ge 4 ; then
> +	    echo "_repair_test_fs: couldn't repair filesystem on $device (see $seqres.full)"
> +
> +	    echo "_repair_test_fs: couldn't repair filesystem on $device" >>$seqres.full

We have _log_err() to do these to echos now.

> +	    echo "*** fsck.$FSTYP output ***"	>>$seqres.full
> +            cat $tmp.repair			>>$seqres.full
> +	    echo "*** end fsck.$FSTYP output"	>>$seqres.full
> +	    return 1
> +	fi
> +	return 0
> +	;;
> +    *)
> +	return 1

I think we should try to fix other filesystems too?

> +	;;
> +    esac
> +}
> +

And I'm wondering if a new helper function can be factored out and used
in both _repair_scratch_fs and _repair_scratch_fs? One of the problems I
see in doing this is that we don't have a _test_xfs_repair() counterpart
right now.

Thanks,
Eryu

>  _get_pids_by_name()
>  {
>      if [ $# -ne 1 ]
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html