On Wed, Oct 05, 2016 at 08:52:54AM -0700, Linus Torvalds wrote: > On Tue, Oct 4, 2016 at 10:44 PM, Willy Tarreau <w@xxxxxx> wrote: > > > > I think instead we should completely remove any simple way to halt the > > system and document how to do it. > > Having slept on it, I suspect you're right. I worry about some > BUG_ON() that really relies on the killing behavior, but if it takes a > "real" fault later, that is when it gets killed. And on the whole, > we've had lots of problems with the killing behavior over the years, > so we should just try switching BUG_ON() over to non-fatal. It's > unlikely to be worse than what we have now, as exemplified by this > event. I have the same doubts, so at least I would not want to run the "sed" immediately, at least to keep the initial intent. But I think everyone is right in is own yard when he puts a BUG_ON() when he doesn't know how to handle an unsafe situation, he's wrong from a global perspective. For example, it could be seen as safe to crash the system in a filesystem driver to protect against the risk of data corruption resulting from an impossible condition, but when this happens due to a dirty FS on a USB stick that a person inserts on the PC to save her work, actually the BUG_ON() is the one responsible for the data loss. Even something as painful as leaving a process in D state in this situation would have been cleaner as it would let the admin reboot when he wants and not have to experience it at the worst moment. I've already met 100% reproducible panics that I never had the time to inestigate (one involving running an mmap-based hex editor on /dev/mem, and the other one doing stupid things with mount --move), and I'm sure once I find the cause I'll see a BUG_ON() that should have been a warning. I'm pretty sure there are historically valid BUG_ON() that are probably not needed anymore just like I'm also convinced that some of them are hard to get rid of. Maybe at least having the same as WARN_ON() but prepending the dump with a message saying "you encountered a critical bug which should have crashed the kernel, you must absolutely report it" would help at the beginning. Cheers, Willy -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html