Linus Torvalds wrote: > Because another thing that tends to affect this is that oom without swap is > very different from oom with lots of swap, so different people will see > very different issues. If you have some particular case you want to check, > and could make a VM image for it, maybe that would get more mm people > looking at it and agreeing about the issues. I was working at support center for troubleshooting RHEL systems. I saw many trouble cases where customer's servers hung up / rebooted unexpectedly. In most cases, their servers hung up without OOM killer messages. (I saw few cases where OOM killer messages are discovered by analyzing vmcore.) No messages are recorded to log files such as /var/log/messages and /var/log/sa/ when their servers hung up. According to /var/log/sa/ , there was little free memory just before their servers hung up. I suspected that something memory related problem happened and suggested customers to install serial console or netconsole in case the kernel was printing some messages, but I don't know whether they were able to install serial console or netconsole into their production systems. The origin of this OOM livelock discussion was a local OOM-DoS vulnerability which exists since Linux 2.0. When I tested this vulnerability on RHEL 7, I saw strange stalls on XFS. The discussion went to public by developing a reproducer which does not make use of the vulnerability. We recognized the "too small to fail" memory-allocation rule. I tested various corner cases using variants of the reproducer. I realized that we have race window where the memory allocation can fall into infinite loop without OOM killer messages. I made a hypothesis that customer's servers hit a race where __GFP_FS allocations are blocked at too_many_isolated() or unkillable locks in direct reclaim paths whereas !__GFP_FS allocations are retrying forever without calling out_of_memory(). But even if they install serial console or netconsole, we are currently emitting no warning messages. The timeout based OOM warning corresponds to check_memalloc_delay() in http://marc.info/?l=linux-kernel&m=143239201905479 . The timeout based OOM warning is not only for stalls after an OOM victim was chosen but also for stalls before an OOM victim is chosen. Whether we should call out_of_memory() upon timeout might depend on hardware / ram / swap / workload etc. But I think that whether we can have a mechanism for warning about possible OOM livelock is independent. Thus, I think that making a VM image is not helpful. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>