[PATCH] mm, oom: distinguish blockable mode for mmu notifiers

christian.koenig@xxxxxxx (Christian König) · Fri, 24 Aug 2018 14:52:26 +0200

Am 24.08.2018 um 14:33 schrieb Michal Hocko:
> On Fri 24-08-18 14:18:44, Christian KÃ¶nig wrote:
>> Am 24.08.2018 um 14:03 schrieb Michal Hocko:
>>> On Fri 24-08-18 13:57:52, Christian KÃ¶nig wrote:
>>>> Am 24.08.2018 um 13:52 schrieb Michal Hocko:
>>>>> On Fri 24-08-18 13:43:16, Christian KÃ¶nig wrote:
>>> [...]
>>>>>> That won't work like this there might be multiple
>>>>>> invalidate_range_start()/invalidate_range_end() pairs open at the same time.
>>>>>> E.g. the lock might be taken recursively and that is illegal for a
>>>>>> rw_semaphore.
>>>>> I am not sure I follow. Are you saying that one invalidate_range might
>>>>> trigger another one from the same path?
>>>> No, but what can happen is:
>>>>
>>>> invalidate_range_start(A,B);
>>>> invalidate_range_start(C,D);
>>>> ...
>>>> invalidate_range_end(C,D);
>>>> invalidate_range_end(A,B);
>>>>
>>>> Grabbing the read lock twice would be illegal in this case.
>>> I am sorry but I still do not follow. What is the context the two are
>>> called from?
>> I don't have the slightest idea.
>>
>>> Can you give me an example. I simply do not see it in the
>>> code, mostly because I am not familiar with it.
>> I'm neither.
>>
>> We stumbled over that by pure observation and after discussing the problem
>> with Jerome came up with this solution.
>>
>> No idea where exactly that case comes from, but I can confirm that it indeed
>> happens.
> Thiking about it some more, I can imagine that a notifier callback which
> performs an allocation might trigger a memory reclaim and that in turn
> might trigger a notifier to be invoked and recurse. But notifier
> shouldn't really allocate memory. They are called from deep MM code
> paths and this would be extremely deadlock prone. Maybe Jerome can come
> up some more realistic scenario. If not then I would propose to simplify
> the locking here. We have lockdep to catch self deadlocks and it is
> always better to handle a specific issue rather than having a code
> without a clear indication how it can recurse.

Well I agree that we should probably fix that, but I have some concerns 
to remove the existing workaround.

See we added that to get rid of a real problem in a customer environment 
and I don't want to that to show up again.

In the meantime I've send out a fix to avoid allocating memory while 
holding the mn_lock.

Thanks for pointing that out,
Christian.