Hello, Peter. On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote: > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > 5. OOM handler keeps flushing but 4 repeats and the queue is never > > shrinking. Because OOM handler is trapped in printk flushing, it > > never manages to free memory and no one else can enter OOM path > > either, so the system is trapped in this state. > > Why not kill recursive OOM (msgs) ? Sure, we can do that too, e.g. marking flushing thread and ignoring new messages from it, although that does come with its own downsides. The choices are * If we can make printk safe without much downside, that'd be the best option. * If we decide that we can't do that in a reasonable way, we sure can try to plug the identified cases. We might have to play a bit of whack-a-mole (e.g. the feedback loop might not necessarily be from the same context) but there likely are very few repeatable cases. It could be me not knowing the history of the discussion but up until now the discussion hasn't really gotten to that point since I brought up the case that we've been seeing. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>