Re: Instability in current -git tree

Pavel Tatashin <pasha.tatashin@xxxxxxxxxx> · Fri, 13 Jul 2018 20:19:17 -0400

On 07/13/2018 07:58 PM, Andrew Morton wrote:
> On Fri, 13 Jul 2018 16:51:39 -0700 Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> On Fri, Jul 13, 2018 at 4:48 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> (But it would be interesting to see whether removing the check "fixes" it)
>>
>> I'm building a "replace VM_BUG_ON() with proper printk's instead" right now.

I'd like to try to reproduce it as well, were you able to reproduce this problem in qemu? What were the qemu arguments if so?

>>
>> Honestly, I think VM_BUG_ON() is complete garbage to begin with. We
>> know the code can't depend on it, since it's only enabled for VM
>> developers. And if it ever triggers, it doesn't get logged because the
>> machine is dead (since the VM code almost always holds critical
>> locks). So it's exactly the worst kind of BUG_ON.
>>
>> Can we turn VM_BUG_ON() into "WARN_ON_ONCE()" and be done with it? The
>> VM developers will actually get better reports, and non-vm-developers
>> don't have dead machines.
>>
> 
> OK by me.  I don't recall ever thinking "gee, I wish the machine had
> crashed at this point".

VM_BUG_ON() changing to WARN_ON_ONCE() is OK, because it is enabled only with CONFIG_DEBUG_VM.
Sometimes, however, it is better to crash. Examples include the possibility of user data getting corrupted, and security vulnerabilities. Once, kernel gets into a broken state, such as invalid pagetable entries,  but continues executing data written to disk, nvram, or sent over network is unreliable. Another example include crash dumps that are hard to analyze as the corruption is long passed.

Pavel