Re: [RFC PATCH 1/3] test-tool: don't fake up BUG() exits as code 99

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 09 Jun 2022 10:09:25 +0200

On Wed, Jun 08 2022, Jeff King wrote:

> On Fri, Jun 03, 2022 at 02:05:49PM -0700, Junio C Hamano wrote:
>
>> > Are we sure that the reason no longer applies?  How do we know?  We
>> > would want to explain that to future developers in the proposed log
>> > message, I would think.
>> 
>> We can flip it the other way around.  
>> 
>> I do not think I ever saw anybody asked anybody on this list who got
>> a BUG() message to use the coredump to do something useful.  Don't
>> modern distros ship with "ulimit -c 0" these days?
>
> Certainly I have found the coredumps useful, though I would expect most
> Git developers to be able to run under a debugger and stop at BUG()
> anyway. So we could probably live without that, but...
>
>> It might be possible that a better direction is to introduce
>> GIT_ABORT_ON_BUG environment or core.abortOnBUG configuration that
>> chooses between abort() and exit(99), or something like that, and
>> then we switch to use the latter by default over time?
>
> It really should continue to die with a signal (or an exit code that
> pretends to be one) to continue triggering even under test_must_fail().
>
> IMHO the whole "let's trigger BUG() intentionally via test-tool" stuff
> in t1406 is misguided. It's literally introducing broken code that is
> not run in the normal Git binary in order to make sure that it's broken.

I haven't looked at that ref code in particular, but in general adding
coverage for the case we actually expect to defend against with BUG()
via a test-tool is something I think we could use in more cases.

E.g. parse-options.c will BUG() out on various bad options structs,
there it makes sense to feed those bad structs in to make sure our
assertion code is doing what we still expect, but we don't have those
tests.

> If that were the only spot, I'd suggest just getting rid of it entirely.
> But the code in t0210 that checks the handling of trace2 and BUG() is
> probably worth keeping.

Yes, we definitely want to test what *actually* happens as far as 

> I do agree that an environment variable would be a better selector than
> "this code is linked against test-tool". I thought so even back in:
>
>   https://lore.kernel.org/git/20180507090109.GA367@xxxxxxxxxxxxxxxxxxxxx/
>
> :) That message also covers the flip-side case discussed earlier in this
> thread: why calling abort() unconditionally in the test suite can be a
> pain.

I didn't know about that Jenkins case, but in any case I think we can
probably stop worrying about this given that we haven't had complaints
about ac14de13b22 (t4058: explore duplicate tree entry handling in a bit
more detail, 2020-12-11) (although I've noticed in in "dmesg" before).

I.e. since v2.31.0 running the test suite has produced at least 2
segfaults per run:

    $ (sudo dmesg -c; ./t4058-diff-duplicates.sh) >/dev/null; sudo dmesg | grep segfault
    [17687265.247252] git[11336]: segfault at 40 ip 0000000000700916 sp 00007ffc10d5dda0 error 4 in git[405000+33d000]
    [17687265.297027] git[11397]: segfault at 40 ip 0000000000700916 sp 00007ffd7dfd5310 error 4 in git[405000+33d000]

Perhaps those tests aren't valuable, but I think that shows that the
GIT_BUG_ABORT approach you suggested should probably be tweaked to
instead be some test prereq like SEGFAULT_AND_ABORT_OK.

On the other hand those segfaults in t4058 should probably be converted
to a BUG()...