Re: [PATCH] mm: fix oom work when memory is under pressure

zhong jiang <zhongjiang@xxxxxxxxxx> · Wed, 14 Sep 2016 15:13:50 +0800

On 2016/9/13 21:28, Michal Hocko wrote:
> On Tue 13-09-16 21:13:21, zhong jiang wrote:
>> On 2016/9/13 1:44, Michal Hocko wrote:
> [...]
>>> If you want to solve this problem properly then you would have to give
>>> tasks which are looping in the page allocator access to some portion of
>>> memory reserves. This is quite tricky to do right, though.
>> To use some portion of memory reserves is almost no effect in a so
>> starvation scenario.  I think the hungtask still will occur. it can
>> not solve the problem primarily.
> Granting an access to memory reserves is of course no full solution but
> it raises chances for a forward progress. Other solutions would have to
> guarantee that the memory reclaimed on behalf of the requester will be
> given to the requester. Not an easy task
>
>>> Retry counters with the fail path have been proposed in the past and not
>>> accepted.
>> The above patch have been tested by runing the trinity.  The question
>> is fixed.  Is there any reasonable reason oppose to the patch ? or it
>> will bring in any side-effect.
> Sure there is. Low order allocations have been traditionally non failing
> and changing that behavior is a major obstacle because it opens up a
> door to many bugs. I've tried to do something similar in the past and
> there was a strong resistance against it. Believe me been there done
> that...
>
  hi, Michal

  Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.

[  601.937145] Call trace:
[  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
[  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
[  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
[  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
[  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
[  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The root case is that ksmd hold the read lock. and the lock is not released.
 scan_get_next_rmap_item
         down_read
                   get_next_rmap_item
                             alloc_rmap_item     #ksmd will loop permanently.

How do you see this kind of situation ? or  let the issue alone.

Thanks
zhongjiang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>