Hello, a friend of mine started seeing crashes with 3.18.25 kernel - once appropriate load is put on the machine it crashes within minutes. He tracked down that reverting commit 874bbfe600a6 (this is the commit ID from Linus' tree, in stable tree the commit ID is 1e7af294dd03) "workqueue: make sure delayed work run in local cpu" makes the kernel stable again. I'm attaching screenshot of the crash - sadly the initial part is missing but it seems that we crashed when processing timers on otherwise idle CPU. This is a production machine so experimentation is not easy but if we really need more information it may be possible to reproduce the issue again and gather it. Anyone has idea what is going on? I was looking into the code for a while but so far I have no good explanation. It would be good to understand the cause instead of just blindly reverting the commit from stable tree... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR
Attachment:
delayed-work-oops.png
Description: PNG image