On Mon 05-12-16 22:45:19, Tetsuo Handa wrote: > Michal Hocko wrote: > > __alloc_pages_may_oom makes sure to skip the OOM killer depending on > > the allocation request. This includes lowmem requests, costly high > > order requests and others. For a long time __GFP_NOFAIL acted as an > > override for all those rules. This is not documented and it can be quite > > surprising as well. E.g. GFP_NOFS requests are not invoking the OOM > > killer but GFP_NOFS|__GFP_NOFAIL does so if we try to convert some of > > the existing open coded loops around allocator to nofail request (and we > > have done that in the past) then such a change would have a non trivial > > side effect which is not obvious. Note that the primary motivation for > > skipping the OOM killer is to prevent from pre-mature invocation. > > > > The exception has been added by 82553a937f12 ("oom: invoke oom killer > > for __GFP_NOFAIL"). The changelog points out that the oom killer has to > > be invoked otherwise the request would be looping for ever. But this > > argument is rather weak because the OOM killer doesn't really guarantee > > any forward progress for those exceptional cases - e.g. it will hardly > > help to form costly order - I believe we certainly do not want to kill > > all processes and eventually panic the system just because there is a > > nasty driver asking for order-9 page with GFP_NOFAIL not realizing all > > the consequences - it is much better this request would loop for ever > > than the massive system disruption, lowmem is also highly unlikely to be > > freed during OOM killer and GFP_NOFS request could trigger while there > > is still a lot of memory pinned by filesystems. > > I disagree. I believe that panic caused by OOM killer is much much better > than a locked up system. I hate to add new locations that can lockup inside > page allocator. This is __GFP_NOFAIL and reclaim has failed. As a matter of fact any __GFP_NOFAIL can lockup inside the allocator. Full stop. There is no guaranteed way to make a forward progress with the current page allocator implementation. So we are somewhere in the middle between pre-mature and pointless system disruption (GFP_NOFS with a lots of metadata or lowmem request) where the OOM killer even might not help and potential lockup which is inevitable with the current design. Dunno about you but I would rather go with the first option. To be honest I really fail to understand your line of argumentation. We have this do { cond_resched(); } (page = alloc_page(GFP_NOFS)); vs. page = alloc_page(GFP_NOFS | __GFP_NOFAIL); the first one doesn't invoke OOM killer while the later does. This discrepancy just cannot make any sense... The same is true for alloc_page(GFP_DMA) vs alloc_page(GFP_DMA|__GFP_NOFAIL) Now we can discuss whether it is a _good_ idea to not invoke OOM killer for those exceptions but whatever we do __GFP_NOFAIL is not a way to give such a subtle side effect. Or do you disagree even with that? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>