Re: [PATCH] usage: trace2 BUG() invocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 04, 2021 at 10:17:29PM -0800, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
> 
> > die() messages are traced in trace2, but BUG() messages are not. Anyone
> > tracking die() messages would have even more reason to track BUG().
> > Therefore, write to trace2 when BUG() is invoked.
> >
> > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> > ---
> > This was noticed when we observed at $DAYJOB that a certain BUG()
> > invocation [1] wasn't written to traces.
> >
> > [1] https://lore.kernel.org/git/YBn3fxFe978Up5Ly@xxxxxxxxxx/
> > ---
> >  t/helper/test-trace2.c   |  9 +++++++++
> >  t/t0210-trace2-normal.sh | 19 +++++++++++++++++++
> >  usage.c                  |  6 ++++++
> >  3 files changed, 34 insertions(+)
> 
> Sounds like a good idea.  Expert opinions?

I like the overall idea, but it does open the possibility of a BUG() in
the trace2 code looping infinitely.

For instance, injecting this bug:

  diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
  index 6353e8ad91..b386bae402 100644
  --- a/trace2/tr2_tgt_event.c
  +++ b/trace2/tr2_tgt_event.c
  @@ -242,6 +242,7 @@ static void fn_error_va_fl(const char *file, int line, const char *fmt,
          if (fmt && *fmt)
                  jw_object_string(&jw, "fmt", fmt);
          jw_end(&jw);
  +       jw_end(&jw);
   
          tr2_dst_write_line(&tr2dst_event, &jw.json);
          jw_release(&jw);

and running something like:

  GIT_TRACE2_EVENT=1 ./git config --file=does.not.exist --list

currently yields:

  [...a bunch of trace2 lines...]
  Aborted
  BUG: json-writer.c:397: json-writer: too many jw_end(): '{"event":"error",[...etc...]

With this patch it loops forever (well, probably until it runs out of
memory, but I stopped it after one minute and 10G of RAM). ;)

Presumably triggering this in practice would be pretty rare.  But then,
the point of BUG() is to find things we do not expect to happen.

We've had a similar problem on the die() side in the past, and solved it
with a recursion flag. But note it gets a bit non-trivial in the face of
threads. There's some discussion in 1ece66bc9e (run-command: use
thread-aware die_is_recursing routine, 2013-04-16).

That commit talks about a case where "die()" in a thread takes down the
thread but not the whole process. That wouldn't be true here (we'd
expect BUG() to take everything down). So a single counter might be OK
in practice, though I suspect we could trigger the problem racily
Likewise this is probably a lurking problem when other threaded code
calls die(), but we just don't do that often enough for anybody to have
noticed.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux