Re: [PATCH] usage: trace2 BUG() invocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/5/2021 4:01 AM, Jeff King wrote:
> On Thu, Feb 04, 2021 at 10:17:29PM -0800, Junio C Hamano wrote:
> 
>> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>>
>>> die() messages are traced in trace2, but BUG() messages are not. Anyone
>>> tracking die() messages would have even more reason to track BUG().
>>> Therefore, write to trace2 when BUG() is invoked.
>>>
>>> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
>>> ---
>>> This was noticed when we observed at $DAYJOB that a certain BUG()
>>> invocation [1] wasn't written to traces.
>>>
>>> [1] https://lore.kernel.org/git/YBn3fxFe978Up5Ly@xxxxxxxxxx/
>>> ---
>>>  t/helper/test-trace2.c   |  9 +++++++++
>>>  t/t0210-trace2-normal.sh | 19 +++++++++++++++++++
>>>  usage.c                  |  6 ++++++
>>>  3 files changed, 34 insertions(+)
>>
>> Sounds like a good idea.  Expert opinions?
> 
> I like the overall idea, but it does open the possibility of a BUG() in
> the trace2 code looping infinitely.

I also like the idea. This infinite loop is scary.

> We've had a similar problem on the die() side in the past, and solved it
> with a recursion flag. But note it gets a bit non-trivial in the face of
> threads. There's some discussion in 1ece66bc9e (run-command: use
> thread-aware die_is_recursing routine, 2013-04-16).
> 
> That commit talks about a case where "die()" in a thread takes down the
> thread but not the whole process. That wouldn't be true here (we'd
> expect BUG() to take everything down). So a single counter might be OK
> in practice, though I suspect we could trigger the problem racily
> Likewise this is probably a lurking problem when other threaded code
> calls die(), but we just don't do that often enough for anybody to have
> noticed.

Would a simple "BUG() has been called" static suffice?

diff --git a/usage.c b/usage.c
index 1868a24f7a..0d2408f79e 100644
--- a/usage.c
+++ b/usage.c
@@ -265,7 +265,11 @@ int BUG_exit_code;
 
 static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_list params)
 {
+       static int in_bug = 0;
        char prefix[256];
+       if (in_bug)
+               abort();
+       in_bug = 1;
 
        /* truncation via snprintf is OK here */
        if (file)

Note that the NOTRETURN means we can't no-op with something like

	if (in_bug)
		return;

so the trace2 call would want to be as close to the abort as
possible to avoid a silent failure. So, in the patch...

>>> diff --git a/usage.c b/usage.c
>>> index 1868a24f7a..16272c5348 100644
>>> --- a/usage.c
>>> +++ b/usage.c
>>> @@ -273,6 +273,12 @@ static NORETURN void BUG_vfl(const char *file, int line, const char *fmt, va_lis
>>>  	else
>>>  		snprintf(prefix, sizeof(prefix), "BUG: ");
>>>  
>>> +	/*
>>> +	 * We call this trace2 function first and expect it to va_copy 'params'
>>> +	 * before using it (because an 'ap' can only be walked once).
>>> +	 */
>>> +	trace2_cmd_error_va(fmt, params);
>>> +
>>>  	vreportf(prefix, fmt, params);

We would want this vreportf() to be before the call to
trace2_cmd_error_va(), right?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux