On Fri, Jan 29, 2016 at 03:38:06PM +0100, Michal Hocko wrote: > That would require the oom victim to release the memory and drop > TIF_MEMDIE before we go out_of_memory again. And that might happen > anytime whether we are holding oom_trylock or not because it doesn't > synchronize the exit path. So we are basically talking about: > > should_alloc_retry > [1] > get_page_from_freelist(ALLOC_WMARK_HIGH) > [2] > out_of_memory > > and the race window for 1 is much smaller than 2 because [2] is quite [1] is before should_alloc_retry is set. It covers the entire reclaim. > costly operation. I wonder if this last moment request ever succeeds. I > have run my usual oom flood tests and it hasn't shown up a single time. For this check to make a difference, you need a lot of small programs all hitting OOM at the same time. Perhaps the trylock on the oom_lock tends to hide the race like you suggested earlier but it doesn't sound accurate if we proceed to oom kill without checking the high wmark at all before killing another task after a random reclaim failure. Also note there's no CPU to save here, this is a very slow path, anything that can increase accuracy and avoid OOM kill false positives (at practical zero CPU cost like here) sounds worth it. > That being said I do not care that much. I just find this confusing and > basically pointless because the whole thing is racy by definition and we > are trying to cover a smaller window. I would understand if we did such > a last attempt right before we are going to kill a selected victim. This > would cover much larger race window. The high wmark itself is still an arbitrary value so yes, it isn't perfect, but the whole OOM killing is an heuristic, so tiny race window to me sounds better than huge race window. Moving this check down inside out_of_memory to reduce the window even further is quite a different proposition than removing the check. Currently we're doing this check after holding the oom_lock, back in 2.6.x it was more more racy, now thanks to the oom_lock it's way more reliable. If you want to increase reliability further I sure agree, but removing the check would drop reliability instead so I don't see how it could be preferable. We can increase reliability further if we'd move this high wmark check after select_bad_process() returned a task (and not -1UL) to be sure all TIF_MEMDIE tasks already were flushed out, before checking the high wmark. Just it would complicate the code and that's probably why it wasn't done. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>