On Mon, Apr 15, 2013 at 5:42 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote: > >> > +static void check_die_recursion(const char *fmt, va_list ap) >> > +{ >> > + static int dying; >> > + >> > + if (!dying++) >> > + return; >> > + >> > + vreportf("fatal: ", fmt, ap); >> >> How do you know it's safe to call vreportf() ? > > Because I hand-audited it. :) > But I think the more important question you > are getting at is: how do I know that it will remain safe to call > vreportf? Right. >> If the bug is in the vreportf code path, we will recurse infinitely >> (at least until the stack is used up). An implementation of vsnprintf >> exists in compat/snprintf.c for example. > > Right. My assumption was that we are primarily interested in protecting > against the die_routine. Compat functions should never be calling die. I think the rule we have been enforcing is less strict than that. We have only said that any compat function in a die handler path should never call die. But maybe that's what you meant. > Of course that is assuming nobody violates that rule, which is part of > the point of the check. > >> It's nice to print out the error message here, but I think doing so >> defeats the purpose of this "dying" check. Better to get the stack >> trace from a core dump. > > Easier said than done, sometimes, if you are debugging somebody else's > problem from a dump of a terminal session. :) > > But I can live with dropping this patch; my primary goal is the bug-fix > in patch three. > > I was also tempted to suggest just dropping the recursion check > altogether. While it is neat to detect such things, it's a "should never > happen" bug situation, and an infinite loop of printing out the same > message is pretty easy to notice. Do you remember what spurred the > original check? The message in cd163d4 doesn't say. That's a valid option. The primary motivation was that Hannes Sixt had to step in and point out yet again that the high-level memory allocators should not be called in anything that could be in a die handler code path. I was on the thread, but I don't remember the topic (ah, Jonathan has stepped in with the answer). I do remember that I was not the only one who had forgotten about that rule though. We didn't actually have someone report that they encountered infinite recursion, but it seemed easy enough to add a check for it, so... Unfortunately, I didn't realize that the async functions installed their own die handler which may not exit the process, allowing die to be called multiple times. To implement this check correctly/completely (i.e. detect recursion in the main thread as well as in any child threads), I think you really do need to use thread-local storage as you mentioned in 3/3 which could look something like: static pthread_key_t dying; static pthread_once_t dying_once = PTHREAD_ONCE_INIT; void setup_die_counter(void) { pthread_key_create(&dying, NULL); } check_die_recursion(void) { pthread_once(&dying_once, setup_die_counter); if (pthread(getspecific(dying)) { puts("BUG: recursion..."); exit(128); } pthread_setspecific(dying, &dying); } or maybe the setup could be performed in set_die_routine(), but it does kinda seem like overkill for a "nicety" like this. So maybe checking for recursion in just the main thread as this series does is better than nothing. -Brandon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html