Re: [PATCH v2] xfs/098: fix xfs_repair on newer xfsprogs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



于 2016/09/09 20:28, Zorro Lang 写道:
> On Fri, Sep 09, 2016 at 05:29:37PM +0800, Xiao Yang wrote:
>> 于 2016/08/26 17:05, Zorro Lang 写道:
>>> On Fri, Aug 26, 2016 at 02:10:00PM +0800, Xiao Yang wrote:
>>>> On 2016/08/26 12:48, Zorro Lang wrote:
>>>>> On Fri, Aug 26, 2016 at 11:36:52AM +0800, Xiao Yang wrote:
>>>>>> Make sure xfs_repair can't clear the log by default when it is corrupted.
>>>>>> xfs_repair always and only clear the log when the -L parameter is specified.
>>>>>> This has updated by:
>>>>>> Commit f2053bc ("xfs_repair: don't clear the log by default")
>>>>>>
>>>>>> Signed-off-by: Xiao Yang<yangx.jy@xxxxxxxxxxxxxx>
>>>>>> ---
>>>>>>  common/rc     | 4 ++--
>>>>>>  tests/xfs/098 | 2 +-
>>>>>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/common/rc b/common/rc
>>>>>> index 3fb0600..c693a31 100644
>>>>>> --- a/common/rc
>>>>>> +++ b/common/rc
>>>>>> @@ -1143,9 +1143,9 @@ _repair_scratch_fs()
>>>>> Hi Xiao
>>>>>
>>>>> You should explain why you changed this function in commit log. Or
>>>>> the reviewer can't understand why you change it.
>>>>>
>>>>>>      xfs)
>>>>>>          _scratch_xfs_repair "$@" 2>&1
>>>>>>  	res=$?
>>>>>> -	if [ "$res" -eq 2 ]; then
>>>>>> +	if [ "$res" -ne 0 ]; then
>>>>> Hi Darrick,
>>>>>
>>>>> The xfs_repair manpage said:
>>>>> xfs_repair run without the -n option will always return a status code of 0.
>>>>>
>>>>> I don't understand why you think it return 2 here? (Please check below)
>>>>>
>>>> Hi Zorro
>>>>
>>>> I don't understand why it return 2 here too.  I want to change this
>>>> function because xfs_repair
>>>> without -L option return 1 when log is corrupted on newer xfsprogs-dev.
>>>>>>  		echo "xfs_repair returns $res; replay log?"
>>>>>> -		_scratch_mount
>>>>>> +		_scratch_mount 2>&1
>>>>>>  		res=$?
>>>>>>  		if [ "$res" -gt 0 ]; then
>>>>>>  			echo "mount returns $res; zap log?"
>>>>>> diff --git a/tests/xfs/098 b/tests/xfs/098
>>>>>> index d91d617..eb33bb1 100755
>>>>>> --- a/tests/xfs/098
>>>>>> +++ b/tests/xfs/098
>>>>>> @@ -93,7 +93,7 @@ echo "+ mount image"
>>>>>>  _scratch_mount 2>/dev/null&&   _fail "mount should not succeed"
>>>>>>
>>>>>>  echo "+ repair fs"
>>>>>> -_scratch_xfs_repair>>   $seqres.full 2>&1
>>>>>> +_repair_scratch_fs>>   $seqres.full
>>> You should print the stderr to $seqres.full too. Because in
>>> "_repair_scratch_fs", its code likes below:
>>>
>>>     xfs)
>>>         _scratch_xfs_repair "$@" 2>&1
>>>>>> This repair won't clear the corrupted log anymore.
>>>         res=$?
>>>         if [ "$res" -eq 2 ]; then
>>>                 echo "xfs_repair returns $res; replay log?"
>>>                 _scratch_mount
>>>>>> So this mount maybe failed if it can't deal with the corrupted log.
>>>>>> If it print some error messages, it'll break the golden image of xfs/098
>>>                 res=$?
>>>                 if [ "$res" -gt 0 ]; then
>>>                         echo "mount returns $res; zap log?"
>>>                         _scratch_xfs_repair -L 2>&1
>>>
>>>
>>>>> If just call xfs_repair without any options, the _repair_scratch_fs won't
>>>>> help to call xfs_repair -L I think.
>>>>>
>>>>> So I think this patch won't fix the problem.
>>>>>
>>>>> Feel free to correct me, if I misunderstand something:)
>>>>>
>>>>> Thanks,
>>>>> Zorro
>>>>>
>>>> If xfs_repair without any option succeed to repair filesystem when
>>>> log is corrupted,
>>>> _repair_scratch_fs don't need  to call  xfs_repair -L.  If it failed
>>>> to repair filesystem,
>>>> _repair_scratch_fs needs  to call  xfs_repair -L.
>>> Oh, sorry, I just tried to run ths case. The "_scratch_xfs_repair" really return
>>> non-zero when it try to repair a corrupted xfs...
>>>
>>> But the manpage(man xfs_repair) really said:
>>> xfs_repair run without the -n option will always return a status code of 0.
>>>
>>> Maybe we should update the manpage? I'll check it later.
>>>
>>> Any way, there's still a problem in your patch, please see above:
>>>
>>> Thanks,
>>> Zorro
>> Hi Zorro
>> Do you know why it returns 2 instead of 1 when we use xfs_repair
>> without any options.
>> I can't understand it, because it always return 1 on my machine.
> Hi Xiao,
>
> Please CC the mail list, there's no secret. And the most important
> thing is if I said something wrong, others great developers maybe
> glad to correct me:-P
>
> I've asked DJ Wong about the return value of xfs_repair, and he
> already replied:
>
> "xfs_repair returns 2 when the log is corrupted, 1 when there's corruption left
> to be fixed *or* some kind of operation error happened, and 0 if either it
> found nothing wrong or all the corruptions were fixed."
>
> I'm sure that email has been sent to you too.
>
> If you can't understand why it return 1, you can check your xfs/098.full file,
> you'll find:
>
> "Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
> Log inconsistent (didn't find previous header)
> failed to find log head
> zero_log: cannot find log head/tail (xlog_find_tail=5)
>
> fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
> filesystem to replay the log or use the -L option to destroy the log and
> attempt a repair.
> xfs_repair failed, err=1"
>
> This output from below xfsprogs code:
>
>         error = xlog_find_tail(log, &head_blk, &tail_blk);
>         if (error) {
>                 do_warn(
>                 _("zero_log: cannot find log head/tail (xlog_find_tail=%d)\n"),
>                         error);
>                 if (!no_modify && !zap_log)
>>>> [exit from here] >>>    do_error(_(
> "ERROR: The log head and/or tail cannot be discovered. Attempt to mount the\n"
> "filesystem to replay the log or use the -L option to destroy the log and\n"
> "attempt a repair.\n"));
>         } else {
>                 if (verbose) {
>                         do_warn(
>         _("zero_log: head block %" PRId64 " tail block %" PRId64 "\n"),
>                                 head_blk, tail_blk);
>                 }
>                 if (!no_modify && head_blk != tail_blk) {
>                         if (zap_log) {
>                                 do_warn(_(
> "ALERT: The filesystem has valuable metadata changes in a log which is being\n"
> "destroyed because the -L option was used.\n"));
>                         } else {
>                                 do_warn(_(
> "ERROR: The filesystem has valuable metadata changes in a log which needs to\n"
> "be replayed.  Mount the filesystem to replay the log, and unmount it before\n"
> "re-running xfs_repair.  If you are unable to mount the filesystem, then use\n"
> "the -L option to destroy the log and attempt a repair.\n"
> "Note that destroying the log may cause corruption -- please attempt a mount\n"
> "of the filesystem before doing this.\n"));
>                                 exit(2);
>                         }
>                 }
>         }
>
> I've marked [exit from here] for you. do_error will call exit(1). And the output
> message already tell you the reason about why it fail.
>
> You can keep reading, there's a "exit(2)" at the end of above code. I can't find
> more exit(2) from xfsprogs/repair/ . So maybe this's the only one place which
> can return 2. From the information above that exit(2), you can see that
> xfs_repair will return 2 when it find there're some valuable metadata changes in
> a log. It think a mount operation maybe can replay this log, so it return 2 and
> suggest the user try to mount the filesystem. If mount can't replay the log, -L
> is the next choice.
>
> So I think the _repair_scratch_fs function in xfstests/common/rc doesn't think
> about above situation. xfs_repair doesn't always return 2 if log corrupted.
> Only xfs_repair feel log can be replay, it'll return 2, or it'll return 1. So
> maybe we should change "if [ $res -eq 2 ]" to "if [ $res -ne 0 ]". Or we need
> to change xfs_repair to make it return 2:-P
>
> For xfs/098's problem, you can change the line#96:
> from
> _scratch_xfs_repair >> $seqres.full 2>&1
> to
> _repair_scratch_fs >> $seqres.full 2>&1
>
> And _repair_scratch_fs need to be modified as I said above. I think I should write
> a patch to describe the return value of xfs_repair(without -n). The current
> xfs_repair manpage said:
> "xfs_repair run without the -n option will always return a status code of 0."
> it's wrong.
>
> OK, I've talked too much. If anyone feel anything wrong, please corrent me:)
>
> Thanks,
> Zorro
>
Thanks for your comment, so i will rewrite this patch.

Thanks,
Xiao Yang
>> Thanks,
>> yang
>>>> Thanks
>>>> Xiao Yang.
>>>>>>  echo "+ mount image (2)"
>>>>>>  _scratch_mount
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>> .
>>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> .
>>>
>>
>>
>
> .
>



--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux