On Wed 18-02-15 20:23:19, Tetsuo Handa wrote: > [ cc fsdevel list - watch out for side effect of 9879de7373fc (mm: page_alloc: > embed OOM killing naturally into allocation slowpath) which was merged between > 3.19-rc6 and 3.19-rc7 , started from > http://marc.info/?l=linux-mm&m=142348457310066&w=2 ] > > Replying in this post picked up from several posts in this thread. > > Michal Hocko wrote: > > Besides that __GFP_WAIT callers should be prepared for the allocation > > failure and should better cope with it. So no, I really hate something > > like the above. > > Those who do not want to retry with invoking the OOM killer are using > __GFP_WAIT + __GFP_NORETRY allocations. > > Those who want to retry with invoking the OOM killer are using > __GFP_WAIT allocations. > > Those who must retry forever with invoking the OOM killer, no matter how > many processes the OOM killer kills, are using __GFP_WAIT + __GFP_NOFAIL > allocations. > > However, since use of __GFP_NOFAIL is prohibited, IT IS NOT PROHIBITED. It is highly discouraged because GFP_NOFAIL is a strong requirement and the caller should be really aware of the consequences. Especially when the allocation is done under locked context. > I think many of > __GFP_WAIT users are expecting that the allocation fails only when > "the OOM killer set TIF_MEMDIE flag to the caller but the caller > failed to allocate from memory reserves". This is not what __GFP_WAIT is defined for. It says that the allocator might sleep. > Also, the implementation > before 9879de7373fc (mm: page_alloc: embed OOM killing naturally > into allocation slowpath) effectively supported __GFP_WAIT users > with such expectation. same as GFP_KERNEL == GFP_NOFAIL for small allocations currently which causes a lot of troubles which were not anticipated at the time this was introduced. And we _should_ move away from that model. Because GFP_NOFAIL should be really explicit rather than implicit. > Michal Hocko wrote: > > Because they cannot perform any IO/FS transactions and that would lead > > to a premature OOM conditions way too easily. OOM killer is a _last > > resort_ reclaim opportunity not something that would happen just because > > you happen to be not able to flush dirty pages. > > But you should not have applied such change without making necessary > changes to GFP_NOFS / GFP_NOIO users with such expectation and testing > at linux-next.git . Applying such change after 3.19-rc6 is a sucker punch. This is a nonsense. OOM was disbaled for !__GFP_FS for ages (since before git era). > Michal Hocko wrote: > > Well, you are beating your machine to death so you can hardly get any > > time guarantee. It would be nice to have a better feedback mechanism to > > know when to back off and fail the allocation attempt which might be > > blocking OOM victim to pass away. This is extremely tricky because we > > shouldn't be too eager to fail just because of a sudden memory pressure. > > Michal Hocko wrote: > > > I wish only somebody like kswapd repeats the loop on behalf of all > > > threads waiting at memory allocation slowpath... > > > > This is the case when the kswapd is _able_ to cope with the memory > > pressure. > > It looks wasteful for me that so many threads (greater than number of > available CPUs) are sleeping at cond_resched() in shrink_slab() when > checking SysRq-t. Imagine 1000 threads sleeping at cond_resched() in > shrink_slab() on a machine with only 1 CPU. Each thread gets a chance > to try calling reclaim function only when all other threads gave that > thread a chance at cond_resched(). Such situation is almost mutually > preventing from making progress. I wish the following mechanism. Feel free to send patches which are not breaking other loads... [...] > Michal Hocko wrote: > > Failing __GFP_WAIT allocation is perfectly fine IMO. Why do you think > > this is a problem? > > Killing a user space process or taking filesystem error actions (e.g. > remount-ro or kernel panic), which choice is less painful for users? > I believe that !(gfp_mask & __GFP_FS) check is a bug and should be removed. pre-mature OOM killer just because the current allocator context doesn't allow for real reclaim is even worse. > Rather, shouldn't allocations without __GFP_FS get more chance to succeed > than allocations with __GFP_FS? If I were the author, I might have added > below check instead. > > /* This is not a critical allocation. Don't invoke the OOM killer. */ > if (gfp_mask & __GFP_FS) > goto out; This doesn't make any sense what so ever. So regular GFP_KERNEL|USER allocations wouldn't invoke oom killer. This includes page faults and basically most of allocations. > Falling into retry loop with same watermark might prevent rescuer threads from > doing memory allocation which is needed for making free memory. Maybe we should > use lower watermark for GFP_NOIO and below, middle watermark for GFP_NOFS, high > watermark for GFP_KERNEL and above. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html