On Wed 14-01-15 22:27:45, Andrea Arcangeli wrote: > Hello everyone, > > On Wed, Jan 07, 2015 at 03:28:04PM +0100, Michal Hocko wrote: > > Instead we shouldn't pretend that GFP_KERNEL is basically GFP_NOFAIL. > > The question is how to get there without too many regressions IMHO. > > Or maybe we should simply bite a bullet and don't be cowards and simply > > deal with bugs as they come. If something really cannot deal with the > > failure it should tell that by a proper flag. > > Not related to memcg but related to GFP_NOFAIL behavior, a couple of > months ago while stress testing some code I've been working on, I run > into several OOM livelocks which may be the same you're reporting here > and I reliably fixed those (at least for my load) so I could keep > going with my work. I didn't try to submit these changes yet, but this > discussion rings a bell... so I'm sharing my changes below in this > thread in case it may help: > > http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=00e91f97df9861454f7e0701944d7de2c382ffb9 OK, this is interesting. We do fail !GFP_FS allocations but did_some_progress might prevent from __alloc_pages_may_oom where we fail. This can lead to a trashing when the reclaim makes some progress but it doesn't help to succeed allocation. This can take many retries until no progress can be done and fail much later. I do agree that failing earlier is slightly better, even though the result would be more allocation failures which has hard to predict outcome. Anyway callers should be prepared for the failure and we can hardly think about performance under such condition. I would happily ack such a patch if you post it. > http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=a0fcf2323b2e4cffd750c1abc1d2c138acdefcc8 I am not sure about this one because TIF_MEMDIE is there to give an access to memory reserves. GFP_NOFAIL shouldn't mean the same because then it would be much harder to "guarantee" that the reserves wouldn't be depleted completely. So I do not like this much. Besides that I think that GFP_NOFAIL allocation blocking OOM victim is a plain bug. grow_dev_page is relying on GFP_NOFAIL but I am wondering whether ext4 can do something to pre-allocate so that it doesn't have to call it. > http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?id=798b7f9d549664f8c0007c6416a2568eedd75d6a I think this should be fixed in the filesystem rather than paper over it. Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>