Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()

Jeff King <peff@xxxxxxxx> · Wed, 31 Mar 2021 07:06:10 -0400

On Mon, Mar 29, 2021 at 03:25:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, this seems like it is missing the point of BUG() completely.  I
> > took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> > like it should really be an error() return or similar. The root cause
> > would be open_istream() on a loose object failing (which might be
> > corruption, or might even be a transient OS error!).
> 
> I don't feel strongly about this bug() thing, I'll drop it if you two
> don't like it.
> 
> But that's not why I added it, yes you can now carefully read the code
> and reason that this code is unreachable now, as I think it is.
> 
> But it may not stay that way, refactoring how we handle I/O errors
> etc. further down the stack is the sort of thing that if this bug()
> wasn't there would cause us to otherwise silently lose the
> error. I.e. does check_object_signature() always promise to return
> non-zero *only* if the signature isn't OK?
> 
> So maybe we are happy to just make that promise, in which case yes, this
> should/could be an error() in this case.

I didn't dig into what check_object_signature() promises, but I don't
think it matters for my argument. If the case you are looking at can be
triggered by bad on-disk data, transient OS errors, etc, then it should
be an error() or a die(), or whatever is appropriate for the code. If it
is meant to be an invariant of the code that it should never trigger,
then it should be a BUG(), so that we loudly inform people that the
code's assumption has been violated.

But I do not see any point in a bug() that does not abort(). The point
of BUG() is that nobody is supposed to see it, and we should be as loud
as possible if we do.

And if there is a call site that is in doubt about what it may be fed,
then it should just be an error() or die().

> But isn't this also useful for multi-threaded code? E.g. let's say fsck
> learns to map-reduce its fsck-ing of objects across threads. One of them
> encounters a BUG(). Do we want to hard kill the whole thing or try to
> limp ahead and report partial results from the other thread(s)?

Yes, we want to hard kill. The point of BUG() is that it is not supposed
to happen, and there is no point in limping further.

-Peff