On 2/5/2021 4:01 AM, Jeff King wrote: > On Thu, Feb 04, 2021 at 10:17:29PM -0800, Junio C Hamano wrote: > >> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes: >> >>> die() messages are traced in trace2, but BUG() messages are not. Anyone >>> tracking die() messages would have even more reason to track BUG(). >>> Therefore, write to trace2 when BUG() is invoked. >>> >>> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> >>> --- >>> This was noticed when we observed at $DAYJOB that a certain BUG() >>> invocation [1] wasn't written to traces. >>> >>> [1] https://lore.kernel.org/git/YBn3fxFe978Up5Ly@xxxxxxxxxx/ >>> --- >>> t/helper/test-trace2.c | 9 +++++++++ >>> t/t0210-trace2-normal.sh | 19 +++++++++++++++++++ >>> usage.c | 6 ++++++ >>> 3 files changed, 34 insertions(+) >> >> Sounds like a good idea. Expert opinions? > > I like the overall idea, but it does open the possibility of a BUG() in > the trace2 code looping infinitely. I also like the idea. This infinite loop is scary. > We've had a similar problem on the die() side in the past, and solved it > with a recursion flag. But note it gets a bit non-trivial in the face of > threads. There's some discussion in 1ece66bc9e (run-command: use > thread-aware die_is_recursing routine, 2013-04-16). > > That commit talks about a case where "die()" in a thread takes down the > thread but not the whole process. That wouldn't be true here (we'd > expect BUG() to take everything down). So a single counter might be OK > in practice, though I suspect we could trigger the problem racily > Likewise this is probably a lurking problem when other threaded code > calls die(), but we just don't do that often enough for anybody to have > noticed. Would a simple "BUG() has been called" static suffice? diff --git a/usage.c b/usage.c index 1868a24f7a..0d2408f79e 100644 --- a/usage.c +++ b/usage.c @@ -265,7 +265,11 @@ int BUG_exit_code; static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_list params) { + static int in_bug = 0; char prefix[256]; + if (in_bug) + abort(); + in_bug = 1; /* truncation via snprintf is OK here */ if (file) Note that the NOTRETURN means we can't no-op with something like if (in_bug) return; so the trace2 call would want to be as close to the abort as possible to avoid a silent failure. So, in the patch... >>> diff --git a/usage.c b/usage.c >>> index 1868a24f7a..16272c5348 100644 >>> --- a/usage.c >>> +++ b/usage.c >>> @@ -273,6 +273,12 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis >>> else >>> snprintf(prefix, sizeof(prefix), "BUG: "); >>> >>> + /* >>> + * We call this trace2 function first and expect it to va_copy 'params' >>> + * before using it (because an 'ap' can only be walked once). >>> + */ >>> + trace2_cmd_error_va(fmt, params); >>> + >>> vreportf(prefix, fmt, params); We would want this vreportf() to be before the call to trace2_cmd_error_va(), right? Thanks, -Stolee