On Wed, Oct 5, 2016 at 12:18 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, Oct 5, 2016 at 12:06 PM, Willy Tarreau <w@xxxxxx> wrote: >> >> I have the same doubts, so at least I would not want to run the "sed" >> immediately, at least to keep the initial intent. But I think everyone >> is right in is own yard when he puts a BUG_ON() when he doesn't know >> how to handle an unsafe situation, he's wrong from a global perspective. > > Yes. And as you say, even when the developer might be right in sone > situations, you'd easily still be wrong for the same code in some > other situation. I just want to chime and and confirm that we really don't want to just wholesale replace BUG with WARN. Most situations using BUG (whether or not they should be) are totally unprepared to continue execution. Which means we'd just get some memory trap or bizarre crash after the WARN instead of the "clean" BUG behavior. > Quite frankly, I wouldn't do a sed-script pass to actually change > existing users. I'd just change how the BUG() implementation itself > works. Not make it a direct WARN_ON(), but perhaps something like > > - use WARN_ON() with a global rate limiter (we do *not* want BUG > cascades, but re-enable the warning after a few minutes) > > - have some kernel command line option for the server people to allow > them to just force a reboot for it > > Hmm? > > Anybody want to play with it? We absolutely have a granularity problem, but we have to retain the no-continued-execution nature of BUG() users. The problem with BUG() is that it is so context-sensitive. In the case you hit, killing the process and continuing life fundamentally failed and the entire system fell over. That wasn't the intent, obviously, but that BUG() got effectively "promoted" to panic(). The cases where I've used BUG() are entirely about doing two things: reporting the current state of the CPU and call stack and to kill the process. (And I'd like to add a third: passing a meaningful string, which right now has to happen with a separate pr_*() call that appears outside the "cut here" line that x86 produces on a BUG.) Now, it can be argued that killing the process part should be configurable and that the code should be written to handle a WARN and clean up and error out nicely. But I still want to retain the "kill the process immediately" behavior in some capacity. The implementation of BUG is also arch-specific, which is frustrating to make changes on. So, maybe another question is "when does BUG kill the system and not just the process?" And can we detect these like we already detect bad locking, interrupt contexts, etc? (Is this question going to have an arch-specific answer?) -Kees -- Kees Cook Nexus Security -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html