Re: [PATCH] xfs_repair: update the manual content about xfs_repair exit status

Zorro Lang <zlang@xxxxxxxxxx> · Tue, 13 Sep 2016 22:44:05 +0800

On Mon, Sep 12, 2016 at 11:01:12AM -0500, Eric Sandeen wrote:
> On 9/9/16 11:47 PM, Zorro Lang wrote:
> > The man 8 xfs_repair said "xfs_repair run without the -n option will
> > always return a status code of 0". That's not correct.
> > 
> > xfs_repair will return 2 if it find valuable metadata changes in log
> > which needs to be replayed, 1 if it can't fix the corruption or some
> > other errors happened and 0 if nothing wrong or all the corruptions
> > were fixed.
> > 
> > Generally xfs_repair -L will always return 0, except it can't clear
> > the log.
> 
> And I think that's an operational type error, not the result
> of a filesystem problem; more like an IO error, or a code bug,
> I *think* ... more below.
> 
> 
> > Signed-off-by: Zorro Lang <zlang@xxxxxxxxxx>
> > ---
> > 
> > Hi,
> > 
> > I  trusted the xfs_repair manpage, and thought xfs_repair will always return 0.
> > But recently I found it lies when I tried to review someone xfstests case.
> > 
> > A correct manpage will help more people to write right cases, so I try to modify
> > the manpage, by search all exit/do_error in xfsprogs/repair. I'm not the best
> > one who learn about xfs_repair, so I just hope I did the right thing:-P Please
> > feel free to correct me.
> > 
> > Thanks,
> > Zorro
> > 
> >  man/man8/xfs_repair.8 | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> > 
> > diff --git a/man/man8/xfs_repair.8 b/man/man8/xfs_repair.8
> > index 1b4d9e3..1f8f13b 100644
> > --- a/man/man8/xfs_repair.8
> > +++ b/man/man8/xfs_repair.8
> > @@ -504,12 +504,23 @@ that is known to be free. The entry is therefore invalid and is deleted.
> >  This message refers to a large directory.
> >  If the directory were small, the message would read "junking entry ...".
> >  .SH EXIT STATUS
> > +.TP
> >  .B xfs_repair \-n
> >  (no modify node)
> >  will return a status of 1 if filesystem corruption was detected and
> >  0 if no filesystem corruption was detected.
> > +.TP
> >  .B xfs_repair
> > -run without the \-n option will always return a status code of 0.
> > +run without the \-n option will return a status code of 2 if it find the
> > +filesystem has valuable metadata changes in log which needs to be
> > +replayed, 1 if there's corruption left to be fixed
> 
> I'm not sure that's the best description; from a quick look, I think
> those exit values of 1 result from do_error(), and in repair that's
> (usually?) due to something like a memory allocation failure, or an
> inconsistent state in the tool; more like hitting an ASSERT.  That might
> leave corruption, but only as a follow-on effect.

Hi Eric,

Many thanks for you can help to review this patch.

I've check all code will exit(1), generally it caused by memory or disk
errors. But some other situations likes:
 - No enough matching AGs or superblocks
 - Primary superblock bad after phase 1
 - Sector size on host filesystem larger than image sector size, when try
   to repair a file image
 ...

will exit(1) too.

But yes, they're all belong to runtime error:) There're too many situations
can return 1. But only one place can return 2, so we can say except return 0
and 2, others will return 1 :-P

>
> > + or can't find log head
> > +and tail or some other errors happened, 
> 
> Which is the same as above, I think - an internal error.
> 
> > and 0 if nothing wrong or all the
> > +corruptions were fixed.
> > +.TP
> > +.B xfs_repair \-L
> > +(Force Log Zeroing)
> > +will return a status code of 1 if it can't clear the log, or will always
> > +return 0.
> 
> 
> How about something like this:
> 
>  .B xfs_repair \-n
>  (no modify node)
>  will return a status of 1 if filesystem corruption was detected and
>  0 if no filesystem corruption was detected.
>  .TP
>  .B xfs_repair
>  run without the \-n option will return a status code of 2 if it finds a
>  filesystem log which needs to be replayed (by a mount/umount cycle), 1 if
>  a runtime error is encountered, and 0 in all other cases, whether or not
>  filesystem corruption was detected.

Your patch(xfs_repair: exit with status 2 if log dirtiness is unknown) will
make xfs_repair return 2, when it can't find log head/tail. I think xfs_repair
won't think the log needs to be replayed if it can't find the log tail/head.

So how about "return a status code of 2 if it finds filesystem log needs to be
replayed or cleared"?

Thanks,
Zorro

> 
> and I'd leave out the bit about xfs_repair -L; really that's just a runtime
> error - if we clear the log and then can't find the head/tail, something
> strange has gone wrong.
> 
> Thanks,
> 
> -Eric
> 
> >  .SH BUGS
> >  The filesystem to be checked and repaired must have been
> >  unmounted cleanly using normal system administration procedures
> > 
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs