Re: [PATCH v2] xfs/098: fix xfs_repair on newer xfsprogs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 09, 2016 at 05:29:37PM +0800, Xiao Yang wrote:
> 于 2016/08/26 17:05, Zorro Lang 写道:
> >On Fri, Aug 26, 2016 at 02:10:00PM +0800, Xiao Yang wrote:
> >>On 2016/08/26 12:48, Zorro Lang wrote:
> >>>On Fri, Aug 26, 2016 at 11:36:52AM +0800, Xiao Yang wrote:
> >>>>Make sure xfs_repair can't clear the log by default when it is corrupted.
> >>>>xfs_repair always and only clear the log when the -L parameter is specified.
> >>>>This has updated by:
> >>>>Commit f2053bc ("xfs_repair: don't clear the log by default")
> >>>>
> >>>>Signed-off-by: Xiao Yang<yangx.jy@xxxxxxxxxxxxxx>
> >>>>---
> >>>>  common/rc     | 4 ++--
> >>>>  tests/xfs/098 | 2 +-
> >>>>  2 files changed, 3 insertions(+), 3 deletions(-)
> >>>>
> >>>>diff --git a/common/rc b/common/rc
> >>>>index 3fb0600..c693a31 100644
> >>>>--- a/common/rc
> >>>>+++ b/common/rc
> >>>>@@ -1143,9 +1143,9 @@ _repair_scratch_fs()
> >>>Hi Xiao
> >>>
> >>>You should explain why you changed this function in commit log. Or
> >>>the reviewer can't understand why you change it.
> >>>
> >>>>      xfs)
> >>>>          _scratch_xfs_repair "$@" 2>&1
> >>>>  	res=$?
> >>>>-	if [ "$res" -eq 2 ]; then
> >>>>+	if [ "$res" -ne 0 ]; then
> >>>Hi Darrick,
> >>>
> >>>The xfs_repair manpage said:
> >>>xfs_repair run without the -n option will always return a status code of 0.
> >>>
> >>>I don't understand why you think it return 2 here? (Please check below)
> >>>
> >>Hi Zorro
> >>
> >>I don't understand why it return 2 here too.  I want to change this
> >>function because xfs_repair
> >>without -L option return 1 when log is corrupted on newer xfsprogs-dev.
> >>>>  		echo "xfs_repair returns $res; replay log?"
> >>>>-		_scratch_mount
> >>>>+		_scratch_mount 2>&1
> >>>>  		res=$?
> >>>>  		if [ "$res" -gt 0 ]; then
> >>>>  			echo "mount returns $res; zap log?"
> >>>>diff --git a/tests/xfs/098 b/tests/xfs/098
> >>>>index d91d617..eb33bb1 100755
> >>>>--- a/tests/xfs/098
> >>>>+++ b/tests/xfs/098
> >>>>@@ -93,7 +93,7 @@ echo "+ mount image"
> >>>>  _scratch_mount 2>/dev/null&&   _fail "mount should not succeed"
> >>>>
> >>>>  echo "+ repair fs"
> >>>>-_scratch_xfs_repair>>   $seqres.full 2>&1
> >>>>+_repair_scratch_fs>>   $seqres.full
> >You should print the stderr to $seqres.full too. Because in
> >"_repair_scratch_fs", its code likes below:
> >
> >     xfs)
> >         _scratch_xfs_repair "$@" 2>&1
> >>>>This repair won't clear the corrupted log anymore.
> >         res=$?
> >         if [ "$res" -eq 2 ]; then
> >                 echo "xfs_repair returns $res; replay log?"
> >                 _scratch_mount
> >>>>So this mount maybe failed if it can't deal with the corrupted log.
> >>>>If it print some error messages, it'll break the golden image of xfs/098
> >                 res=$?
> >                 if [ "$res" -gt 0 ]; then
> >                         echo "mount returns $res; zap log?"
> >                         _scratch_xfs_repair -L 2>&1
> >
> >
> >>>If just call xfs_repair without any options, the _repair_scratch_fs won't
> >>>help to call xfs_repair -L I think.
> >>>
> >>>So I think this patch won't fix the problem.
> >>>
> >>>Feel free to correct me, if I misunderstand something:)
> >>>
> >>>Thanks,
> >>>Zorro
> >>>
> >>If xfs_repair without any option succeed to repair filesystem when
> >>log is corrupted,
> >>_repair_scratch_fs don't need  to call  xfs_repair -L.  If it failed
> >>to repair filesystem,
> >>_repair_scratch_fs needs  to call  xfs_repair -L.
> >Oh, sorry, I just tried to run ths case. The "_scratch_xfs_repair" really return
> >non-zero when it try to repair a corrupted xfs...
> >
> >But the manpage(man xfs_repair) really said:
> >xfs_repair run without the -n option will always return a status code of 0.
> >
> >Maybe we should update the manpage? I'll check it later.
> >
> >Any way, there's still a problem in your patch, please see above:
> >
> >Thanks,
> >Zorro
> Hi Zorro
> Do you know why it returns 2 instead of 1 when we use xfs_repair
> without any options.
> I can't understand it, because it always return 1 on my machine.

Hi Xiao,

Please CC the mail list, there's no secret. And the most important
thing is if I said something wrong, others great developers maybe
glad to correct me:-P

I've asked DJ Wong about the return value of xfs_repair, and he
already replied:

"xfs_repair returns 2 when the log is corrupted, 1 when there's corruption left
to be fixed *or* some kind of operation error happened, and 0 if either it
found nothing wrong or all the corruptions were fixed."

I'm sure that email has been sent to you too.

If you can't understand why it return 1, you can check your xfs/098.full file,
you'll find:

"Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.
xfs_repair failed, err=1"

This output from below xfsprogs code:

        error = xlog_find_tail(log, &head_blk, &tail_blk);
        if (error) {
                do_warn(
                _("zero_log: cannot find log head/tail (xlog_find_tail=%d)\n"),
                        error);
                if (!no_modify && !zap_log)
>>> [exit from here] >>>    do_error(_(
"ERROR: The log head and/or tail cannot be discovered. Attempt to mount the\n"
"filesystem to replay the log or use the -L option to destroy the log and\n"
"attempt a repair.\n"));
        } else {
                if (verbose) {
                        do_warn(
        _("zero_log: head block %" PRId64 " tail block %" PRId64 "\n"),
                                head_blk, tail_blk);
                }
                if (!no_modify && head_blk != tail_blk) {
                        if (zap_log) {
                                do_warn(_(
"ALERT: The filesystem has valuable metadata changes in a log which is being\n"
"destroyed because the -L option was used.\n"));
                        } else {
                                do_warn(_(
"ERROR: The filesystem has valuable metadata changes in a log which needs to\n"
"be replayed.  Mount the filesystem to replay the log, and unmount it before\n"
"re-running xfs_repair.  If you are unable to mount the filesystem, then use\n"
"the -L option to destroy the log and attempt a repair.\n"
"Note that destroying the log may cause corruption -- please attempt a mount\n"
"of the filesystem before doing this.\n"));
                                exit(2);
                        }
                }
        }

I've marked [exit from here] for you. do_error will call exit(1). And the output
message already tell you the reason about why it fail.

You can keep reading, there's a "exit(2)" at the end of above code. I can't find
more exit(2) from xfsprogs/repair/ . So maybe this's the only one place which
can return 2. From the information above that exit(2), you can see that
xfs_repair will return 2 when it find there're some valuable metadata changes in
a log. It think a mount operation maybe can replay this log, so it return 2 and
suggest the user try to mount the filesystem. If mount can't replay the log, -L
is the next choice.

So I think the _repair_scratch_fs function in xfstests/common/rc doesn't think
about above situation. xfs_repair doesn't always return 2 if log corrupted.
Only xfs_repair feel log can be replay, it'll return 2, or it'll return 1. So
maybe we should change "if [ $res -eq 2 ]" to "if [ $res -ne 0 ]". Or we need
to change xfs_repair to make it return 2:-P

For xfs/098's problem, you can change the line#96:
from
_scratch_xfs_repair >> $seqres.full 2>&1
to
_repair_scratch_fs >> $seqres.full 2>&1

And _repair_scratch_fs need to be modified as I said above. I think I should write
a patch to describe the return value of xfs_repair(without -n). The current
xfs_repair manpage said:
"xfs_repair run without the -n option will always return a status code of 0."
it's wrong.

OK, I've talked too much. If anyone feel anything wrong, please corrent me:)

Thanks,
Zorro

> 
> Thanks,
> yang
> >>Thanks
> >>Xiao Yang.
> >>>>  echo "+ mount image (2)"
> >>>>  _scratch_mount
> >>>>-- 
> >>>>1.8.3.1
> >>>>
> >>>>
> >>>>
> >>>>--
> >>>>To unsubscribe from this list: send the line "unsubscribe fstests" in
> >>>>the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>.
> >>>
> >>
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe fstests" in
> >>the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >.
> >
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux