Re: [PATCH] mm: fix oom work when memory is under pressure

zhong jiang <zhongjiang@xxxxxxxxxx> · Mon, 12 Sep 2016 17:51:06 +0800

On 2016/9/9 19:44, Michal Hocko wrote:
> On Tue 06-09-16 22:47:06, zhongjiang wrote:
>> From: zhong jiang <zhongjiang@xxxxxxxxxx>
>>
>> Some hungtask come up when I run the trinity, and OOM occurs
>> frequently.
>> A task hold lock to allocate memory, due to the low memory,
>> it will lead to oom. at the some time , it will retry because
>> it find that oom is in progress. but it always allocate fails,
>> the freed memory was taken away quickly.
>> The patch fix it by limit times to avoid hungtask and livelock
>> come up.
> Which kernel has shown this issue? Since 4.6 IIRC we have oom reaper
> responsible for the async memory reclaim from the oom victim and later
> changes should help to reduce oom lockups even further.
>
> That being said this is not a right approach. It is even incorrect
> because it allows __GFP_NOFAIL to fail now. So NAK to this patch.
>
>> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx>
>> ---
>>  mm/page_alloc.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a178b1d..0dcf08b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>>  	enum compact_result compact_result;
>>  	int compaction_retries = 0;
>>  	int no_progress_loops = 0;
>> +	int oom_failed = 0;
>>  
>>  	/*
>>  	 * In the slowpath, we sanity check order to avoid ever trying to
>> @@ -3645,8 +3646,13 @@ retry:
>>  	page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
>>  	if (page)
>>  		goto got_pg;
>> +	else
>> +		oom_failed++;
>> +
>> +	/* more than limited times will drop out */
>> +	if (oom_failed > MAX_RECLAIM_RETRIES)
>> +		goto nopage;
>>  
>> -	/* Retry as long as the OOM killer is making progress */
>>  	if (did_some_progress) {
>>  		no_progress_loops = 0;
>>  		goto retry;
>> -- 
>> 1.8.3.1
 hi,  Michal
 oom reaper indeed can accelerate the recovery of memory,  but the patch solve the extreme scenario,
 I hit it by runing trinity. I think the scenario can happen whether  oom reaper  or not.
 
The __GFP_NOFAIL should be considered. Thank you for reminding. The following patch is updated.

Thanks
zhongjiang

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a178b1d..47804c1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3457,6 +3457,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
        enum compact_result compact_result;
        int compaction_retries = 0;
        int no_progress_loops = 0;
+       int oom_failed = 0;

        /*
         * In the slowpath, we sanity check order to avoid ever trying to
@@ -3645,8 +3646,15 @@ retry:
        page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
        if (page)
                goto got_pg;
+       else
+               oom_failed++;
+
+       /* more than limited times will drop out */
+       if (oom_failed > MAX_RECLAIM_RETRIES) {
+               WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL);
+               goto nopage;
+       }

-       /* Retry as long as the OOM killer is making progress */
        if (did_some_progress) {
                no_progress_loops = 0;
                goto retry;



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>