Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 29 Mar 2021 15:25:09 +0200

On Sun, Mar 28 2021, Jeff King wrote:

> On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>> 
>> > Add a bug() function that works like error() except the message is
>> > prefixed with "bug:" instead of "error:".
>> >
>> > The reason this is needed is for e.g. the fsck code. If we encounter
>> > what we'd consider a BUG() in the middle of fsck traversal we'd still
>> > like to try as hard as possible to go past that object and complete
>> > the fsck, instead of hard dying. A follow-up commit will introduce
>> > such a use in object-file.c.
>> 
>> Reading the description above, i.e. "to go past that object", the
>> assumed use case seems to be to deal with a data error, not a
>> program bug (which is where we use BUG()---e.g. one helper function
>> in the fsck code detected that the caller wasn't careful enough to
>> vet the data it has and called it with incoherent data).  If we find
>> a tree entry whose mode bits implies that the object recorded in the
>> entry ought to be a blob, and later find out that the object turns
>> out to be a tree, that is a corrupt repository and the code that
>> detected is not buggy (and we shouldn't use BUG(), of course).
>> 
>> So, ... I am skeptical.  If the code is prepared to handle breakage,
>> we would not want to die, but then I am not sure why it has to be
>> different from error().
>
> Yeah, this seems like it is missing the point of BUG() completely.  I
> took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> like it should really be an error() return or similar. The root cause
> would be open_istream() on a loose object failing (which might be
> corruption, or might even be a transient OS error!).

I don't feel strongly about this bug() thing, I'll drop it if you two
don't like it.

But that's not why I added it, yes you can now carefully read the code
and reason that this code is unreachable now, as I think it is.

But it may not stay that way, refactoring how we handle I/O errors
etc. further down the stack is the sort of thing that if this bug()
wasn't there would cause us to otherwise silently lose the
error. I.e. does check_object_signature() always promise to return
non-zero *only* if the signature isn't OK?

So maybe we are happy to just make that promise, in which case yes, this
should/could be an error() in this case.

But isn't this also useful for multi-threaded code? E.g. let's say fsck
learns to map-reduce its fsck-ing of objects across threads. One of them
encounters a BUG(). Do we want to hard kill the whole thing or try to
limp ahead and report partial results from the other thread(s)?

We have than now with pack-objects/grep, but I'm struggling to find a
use-case for a partial grep result if e.g. PCRE fails with a BUG(...)
...