RE: Possible deadloop in direct reclaim?

Lisa Du <cldu@xxxxxxxxxxx> · Wed, 31 Jul 2013 22:19:53 -0700

Loop in Russel King.
Would you please help to comment below questions Mr Motohiro asked about fork allocating order-2 memory? Thanks in advance!
>(7/31/13 10:24 PM), Lisa Du wrote:
>> Dear Kosaki
>>     Would you please help to check my comment as below:
>>> (7/25/13 9:11 PM), Lisa Du wrote:
>>>> Dear KOSAKI
>>>>      In my test, I didn't set compaction. Maybe compaction is helpful
>to
>>> avoid this issue. I can have try later.
>>>>      In my mind CONFIG_COMPACTION is an optional configuration
>>> right?
>>>
>>> Right. But if you don't set it, application must NOT use >1 order
>allocations.
>>> It doesn't work and it is expected
>>> result.
>>> That's your application mistake.
>> Dear Kosaki, I have two questions on your explanation:
>> a) you said if don't set CONFIG_COMPATION, application must NOT use >1
>order allocations, is there any documentation
>   for this theory?
>
>Sorry I don't understand what "this" mean. I mean, Even though you use
>desktop or server machine, no compaction kernel
>easily makes no order-2 situations.
>Then, our in-kernel subsystems don't use order-2 allocations as far as
>possible.
Thanks, now I got your point. 
>
>
>> b) My order-2 allocation not comes from application, but from do_fork
>which is in kernel space,
>    in my mind when a parent process forks a child process, it need to
>allocate a order-2 memory,
>   if a) is right, then CONFIG_COMPATION should be a MUST configuration
>for linux kernel but not optional?
>
>???
>fork alloc order-1 memory for stack. Where and why alloc order-2? If it is
>arch specific code, please
>contact arch maintainer.
Yes arch do_fork allocate order-2 memory when copy_process. 
Hi, Russel
What's your opinion about this question?  
If we really need order-2 memory for fork, then we'd better set CONFIG_COMPATION right?
>
>
>
>>>
>>>>      If we don't use, and met such an issue, how should we deal with
>>> such infinite loop?
>>>>
>>>>      I made a change in all_reclaimable() function, passed overnight
>tests,
>>> please help review, thanks in advance!
>>>> @@ -2353,7 +2353,9 @@ static bool all_unreclaimable(struct zonelist
>>> *zonelist,
>>>>                           continue;
>>>>                   if (!cpuset_zone_allowed_hardwall(zone,
>>> GFP_KERNEL))
>>>>                           continue;
>>>> -               if (!zone->all_unreclaimable)
>>>> +               if (zone->all_unreclaimable)
>>>> +                       continue;
>>>> +               if (zone_reclaimable(zone))
>>>>                           return false;
>>>
>>> Please tell me why you chaned here.
>> The original check is once found zone->all_unreclaimable is false, it will
>return false, then
>>it will set did_some_progress non-zero. Then another loop of
>direct_reclaimed performed.
>>  But I think zone->all_unreclaimable is not always reliable such as in my
>case, kswapd go to
>>  sleep and no one will change this flag. We should also check
>zone_reclaimalbe(zone) if
>>  zone->all_unreclaimalbe = 0 to double confirm if a zone is reclaimable;
>This change also
>>  avoid the issue you described in below commit:
>
>Please read more older code. Your pointed code is temporary change and I
>changed back for fixing
>bugs.
>If you look at the status in middle direct reclaim, we can't avoid race
>condition from multi direct
>reclaim issues. Moreover, if kswapd doesn't awaken, it is a problem. This is
>a reason why current code
>behave as you described.
>I agree we should fix your issue as far as possible. But I can't agree your
>analysis.
I read the code you modified which check the zone->all_unreclaimable instead of zone_reclaimable(zone);
(In the commit 929bea7c714 vmscan: all_unreclaimable() use zone->all_unreclaimable as a name)
Your patch was trying to fix the issue of zone->all_unreclaimable = 1, but zone->pages_scanned = 0 which result all_unreclaimable() return false.
Is there anything else I missed or misunderstanding?
In my change, I'll first check zone->all_unreclaimable, if it was set 1, then I wouldn't check zone->pages_scanned value.
My point is zone->all_unreclaimable = 0 doesn't mean this zone is always reclaimable. As zone->all_unreclaimable can only be set in kswapd.
And kswapd already fully scan all zones and still can't rebalance the system for high-order allocations.  Instead it recheck all watermarks at order-0, and watermarks ok will let kswapd back to sleep. Unfortunately, Kswapd doesn't awaken because long time no higher order allocation wake it up. But this process continue direct reclaim again and again as zone->all_unreclaimable remains 0.
So I also checked the zone->pages_scanned when zone->all_unreclaimable = 0, if zone_reclaimable() return true, then it's really reclaimable for direct reclaimer. This change would break your bug fix right?

Thanks Bob's finding, I read through below thread, and the patch your are trying to fix is the same issue as mine:
mm, vmscan: fix do_try_to_free_pages() livelock
https://lkml.org/lkml/2012/6/14/74
I have the same question as Bob, you already find this issue, why this patch wasn't got merged? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href