Re: [PATCH v2] xfs_repair: update the manual content about xfs_repair exit status

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 14 Sep 2016 11:34:22 +1000

On Tue, Sep 13, 2016 at 04:52:32PM -0500, Eric Sandeen wrote:
> 
> 
> On 9/13/16 4:48 PM, Dave Chinner wrote:
> > On Tue, Sep 13, 2016 at 11:57:59AM -0500, Eric Sandeen wrote:
> >> On 9/13/16 11:32 AM, Darrick J. Wong wrote:
> 
> ...
> 
> >>> So... I'd rather the documentation about the return code reflect the
> >>> status of the filesystem -- 2 means "unclean log, replay it or zap it",
> >>> 1 means "errors encountered, fs may not be correct", and 0 /should/ mean
> >>> "fs is correct".
> >>>
> >>> OTOH I don't know for sure that xfs_repair always cleans up the fs on
> >>> the first try.
> >>
> >> That's certainly the intent; I can't imagine a manpage documenting
> >> return codes qualified with "... unless bugs happen." :)
> > 
> > Right - if we hit bugs, all bets are off. But otherwise, the fs
> > should be repaired and clean after a single pass.
> > 
> >>>  ISTR
> >>> asking Dave about this, and I think he said that the FS should be clean
> >>> if repair returns 0.  But I'll let him reiterate that if it's true;
> >>> don't trust my crummy memory, that's why I have filesystems. ;)
> >>
> >> Did you have an alternate wording in mind?
> > 
> > Yup, 0 = " fs is clean", 1 = "fs is still b0rken",
> > 2 = "couldn't run for whatever reason given"
> 
> Technically, 1 = "may or may not be broken" - we really don't know.
> We could get an exit of 1 for a consistent filesystem, for example
> if some allocation failed... all we know is something bonked out in
> the middle.
> 
> Maybe "1 == xfs_repair did not run to completion?"

Well, if it fails part way through phase 5, then the filesystem is
most definitely broken, even if it was clean to begin with. i.e.
repair, even when the filesystem is clean, will rebuild parts of the
filesystem from scratch.

And repair nulls out directory entries in phase 4 and doesn't
rebuild those directories till phase 6, so between those points the
filesystem is actually in a corrupt state that requires repair.
hence there is a large scope where a failure in repair really does
mean that we need to run repair again. Hence I think it's simply
safer to explicitly document it as:

	"1 == fs may be even more broken than before repair started,
	so repair needs to be run again"

because "did not run to completion" does not really tell the user
what to do when it occurs.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs