On 2016/9/9 19:44, Michal Hocko wrote: > On Tue 06-09-16 22:47:06, zhongjiang wrote: >> From: zhong jiang <zhongjiang@xxxxxxxxxx> >> >> Some hungtask come up when I run the trinity, and OOM occurs >> frequently. >> A task hold lock to allocate memory, due to the low memory, >> it will lead to oom. at the some time , it will retry because >> it find that oom is in progress. but it always allocate fails, >> the freed memory was taken away quickly. >> The patch fix it by limit times to avoid hungtask and livelock >> come up. > Which kernel has shown this issue? Since 4.6 IIRC we have oom reaper > responsible for the async memory reclaim from the oom victim and later > changes should help to reduce oom lockups even further. > > That being said this is not a right approach. It is even incorrect > because it allows __GFP_NOFAIL to fail now. So NAK to this patch. > >> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx> >> --- >> mm/page_alloc.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a178b1d..0dcf08b 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, >> enum compact_result compact_result; >> int compaction_retries = 0; >> int no_progress_loops = 0; >> + int oom_failed = 0; >> >> /* >> * In the slowpath, we sanity check order to avoid ever trying to >> @@ -3645,8 +3646,13 @@ retry: >> page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress); >> if (page) >> goto got_pg; >> + else >> + oom_failed++; >> + >> + /* more than limited times will drop out */ >> + if (oom_failed > MAX_RECLAIM_RETRIES) >> + goto nopage; >> >> - /* Retry as long as the OOM killer is making progress */ >> if (did_some_progress) { >> no_progress_loops = 0; >> goto retry; >> -- >> 1.8.3.1 hi, Michal oom reaper indeed can accelerate the recovery of memory, but the patch solve the extreme scenario, I hit it by runing trinity. I think the scenario can happen whether oom reaper or not. The __GFP_NOFAIL should be considered. Thank you for reminding. The following patch is updated. Thanks zhongjiang diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a178b1d..47804c1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, enum compact_result compact_result; int compaction_retries = 0; int no_progress_loops = 0; + int oom_failed = 0; /* * In the slowpath, we sanity check order to avoid ever trying to @@ -3645,8 +3646,15 @@ retry: page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress); if (page) goto got_pg; + else + oom_failed++; + + /* more than limited times will drop out */ + if (oom_failed > MAX_RECLAIM_RETRIES) { + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL); + goto nopage; + } - /* Retry as long as the OOM killer is making progress */ if (did_some_progress) { no_progress_loops = 0; goto retry; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>