Re: [PATCH 1/3] usage: refactor die-recursion checks

Brandon Casey <drafnel@xxxxxxxxx> · Mon, 15 Apr 2013 19:34:07 -0700

On Mon, Apr 15, 2013 at 5:42 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote:
>
>> > +static void check_die_recursion(const char *fmt, va_list ap)
>> > +{
>> > +       static int dying;
>> > +
>> > +       if (!dying++)
>> > +               return;
>> > +
>> > +       vreportf("fatal: ", fmt, ap);
>>
>> How do you know it's safe to call vreportf() ?
>
> Because I hand-audited it.

:)

> But I think the more important question you
> are getting at is: how do I know that it will remain safe to call
> vreportf?

Right.

>> If the bug is in the vreportf code path, we will recurse infinitely
>> (at least until the stack is used up). An implementation of vsnprintf
>> exists in compat/snprintf.c for example.
>
> Right. My assumption was that we are primarily interested in protecting
> against the die_routine. Compat functions should never be calling die.

I think the rule we have been enforcing is less strict than that.  We
have only said that any compat function in a die handler path should
never call die.  But maybe that's what you meant.

> Of course that is assuming nobody violates that rule, which is part of
> the point of the check.
>
>> It's nice to print out the error message here, but I think doing so
>> defeats the purpose of this "dying" check.  Better to get the stack
>> trace from a core dump.
>
> Easier said than done, sometimes, if you are debugging somebody else's
> problem from a dump of a terminal session. :)
>
> But I can live with dropping this patch; my primary goal is the bug-fix
> in patch three.
>
> I was also tempted to suggest just dropping the recursion check
> altogether. While it is neat to detect such things, it's a "should never
> happen" bug situation, and an infinite loop of printing out the same
> message is pretty easy to notice. Do you remember what spurred the
> original check? The message in cd163d4 doesn't say.

That's a valid option.

The primary motivation was that Hannes Sixt had to step in and point
out yet again that the high-level memory allocators should not be
called in anything that could be in a die handler code path.  I was on
the thread, but I don't remember the topic (ah, Jonathan has stepped
in with the answer).  I do remember that I was not the only one who
had forgotten about that rule though.

We didn't actually have someone report that they encountered infinite
recursion, but it seemed easy enough to add a check for it, so...

Unfortunately, I didn't realize that the async functions installed
their own die handler which may not exit the process, allowing die to
be called multiple times.

To implement this check correctly/completely (i.e. detect recursion in
the main thread as well as in any child threads), I think you really
do need to use thread-local storage as you mentioned in 3/3 which
could look something like:

   static pthread_key_t dying;
   static pthread_once_t dying_once = PTHREAD_ONCE_INIT;

   void setup_die_counter(void)
   {
           pthread_key_create(&dying, NULL);
   }

   check_die_recursion(void)
   {
           pthread_once(&dying_once, setup_die_counter);
           if (pthread(getspecific(dying)) {
                   puts("BUG: recursion...");
                   exit(128);
           }

           pthread_setspecific(dying, &dying);
   }

or maybe the setup could be performed in set_die_routine(), but it
does kinda seem like overkill for a "nicety" like this.  So maybe
checking for recursion in just the main thread as this series does is
better than nothing.

-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html