Re: [PATCH] mm: fix oom work when memory is under pressure

zhong jiang <zhongjiang@xxxxxxxxxx> · Sun, 18 Sep 2016 12:04:47 +0800

On 2016/9/17 6:13, Hugh Dickins wrote:
> On Wed, 14 Sep 2016, Michal Hocko wrote:
>> On Wed 14-09-16 10:42:19, Michal Hocko wrote:
>>> [Let's CC Hugh]
>> now for real...
>>
>>> On Wed 14-09-16 15:13:50, zhong jiang wrote:
>>> [...]
>>>>   hi, Michal
>>>>
>>>>   Recently, I hit the same issue when run a OOM case of the LTP and ksm enable.
>>>>  
>>>> [  601.937145] Call trace:
>>>> [  601.939600] [<ffffffc000086a88>] __switch_to+0x74/0x8c
>>>> [  601.944760] [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
>>>> [  601.950007] [<ffffffc000a1c09c>] schedule+0x3c/0x94
>>>> [  601.954907] [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
>>>> [  601.961289] [<ffffffc000a1e32c>] down_write+0x64/0x80
>>>> [  601.966363] [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
>>>> [  601.971523] [<ffffffc0000be650>] mmput+0x118/0x11c
>>>> [  601.976335] [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
>>>> [  601.981321] [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
>>>> [  601.986656] [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
>>>> [  601.991904] [<ffffffc000089fcc>] do_signal+0x1d8/0x450
>>>> [  601.997065] [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
>>> So this is a hung task triggering because the exiting task cannot get
>>> the mmap sem for write because the ksmd holds it for read while
>>> allocating memory which just takes ages to complete, right?
>>>
>>>> The root case is that ksmd hold the read lock. and the lock is not released.
>>>>  scan_get_next_rmap_item
>>>>          down_read
>>>>                    get_next_rmap_item
>>>>                              alloc_rmap_item     #ksmd will loop permanently.
>>>>
>>>> How do you see this kind of situation ? or  let the issue alone.
>>> I am not familiar with the ksmd code so it is hard for me to judge but
>>> one thing to do would be __GFP_NORETRY which would force a bail out from
>>> the allocation rather than looping for ever. A quick look tells me that
>>> the allocation failure here is quite easy to handle. There might be
>>> others...
> Yes, very good suggestion in this case: the ksmd code does exactly the
> right thing when that allocation fails, but was too stupid to use an
> allocation mode which might fail - and it can allocate rather a lot of
> slots along that path, so it will be good to let it break out there.
>
> Thank you, Zhongjiang, please send akpm a fully signed-off patch, tagged
> for stable, with your explanation above (which was a lot more helpful
> to me than what you wrote in your other mail of Sept 13th).  But please
> make it GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN (and break that line
> before 80 cols): the allocation will sometimes fail, and we're not at
> all interested in hearing about that.
>
> Michal, how would you feel about this or a separate patch adding
> __GFP_HIGH to the allocation in ksm's alloc_stable_node()?  That
> allocation could cause the same problem, but it is much less common
> (so less important to do anything about it), and differs from the
> rmap_item case in that if it succeeds, it will usually free a page;
> whereas if it fails, the fallback (two break_cow()s) may want to
> allocate a couple of pages.  So __GFP_HIGH makes more sense for it
> than __GFP_NORETRY: but perhaps we prefer not to add __GFP_HIGHs?
>
> Hugh
>
> .
>
  I agree.  it indeed make progress.  if alloc_stable_node fails to allocate memory,
  some memory need to obtain from kernel at same time. the pressure suddenly
  will increase.

 index 5048083..72dc475 100644

--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -299,7 +299,7 @@ static inline void free_rmap_item(struct rmap_item *rmap_item)

 static inline struct stable_node *alloc_stable_node(void)
 {
-       return kmem_cache_alloc(stable_node_cache, GFP_KERNEL);
+       return kmem_cache_alloc(stable_node_cache, __GFP_HIGH);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>