Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection

Wei Yang <richard.weiyang@xxxxxxxxx> · Thu, 22 Nov 2018 23:53:37 +0000

On Thu, Nov 22, 2018 at 10:53:31PM +0100, David Hildenbrand wrote:
>On 22.11.18 22:28, Wei Yang wrote:
>> On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote:
>>> On 22.11.18 11:12, Wei Yang wrote:
>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>> zone is empty.
>>>>
>>>> Currently the online_pages phase is protected by the global lock
>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>
>>>> This patch is a preparation for removing the global lock during
>>>> online_pages phase. Also this patch changes the documentation of
>>>> node_size_lock to include the protectioin of nr_zones.
>>>
>>> I looked into locking recently, and there is more to it.
>>>
>>> Please read:
>>>
>>> commit dee6da22efac451d361f5224a60be2796d847b51
>>> Author: David Hildenbrand <david@xxxxxxxxxx>
>>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>>
>>>    memory-hotplug.rst: add some details about locking internals
>>>    
>>>    Let's document the magic a bit, especially why device_hotplug_lock is
>>>    required when adding/removing memory and how it all play together with
>>>    requests to online/offline memory from user space.
>>>
>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>>> as of now.
>>>
>>> mem_hotplug_begin() is just an internal optimization. (we don't want
>>> everybody to take the device lock)
>>>
>> 
>> Hi, David
>> 
>> Thanks for your comment.
>
>My last sentence should have been "we don't want everybody to take the
>device hotplug lock" :) That caused confusion.
>
>> 
>> Hmm... I didn't catch your point.
>> 
>> Related to memory hot-plug, there are (at least) three locks,
>> 
>>   * device_hotplug_lock    (global)
>>   * device lock            (device scope)
>>   * mem_hotplug_lock       (global)
>> 
>> But with two different hold sequence in two cases:
>> 
>>   * device_online()
>> 
>>     device_hotplug_lock
>>     device_lock
>>     mem_hotplug_lock
>> 
>>   * add_memory_resource()
>> 
>>     device_hotplug_lock
>>     mem_hotplug_lock
>>     device_lock
>>        ^
>>        |
>>        I don't find where this is hold in add_memory_resource(). 
>>        Would you mind giving me a hint?
>> 
>> If my understanding is correct, what is your point?
>> 
>
>The point I was trying to make:
>
>Right now all onlining/offlining/adding/removing is protected by the
>device_hotplug_lock (and that's a good thing, things are fragile enough
>already).
>
>mem_hotplug_lock() is used in addition for get_online_mems().
>
>"This patch is a preparation for removing the global lock during
>online_pages phase." - is more like "one global lock".
>

Thanks for reminding. You are right.

>> I guess your point is : just remove mem_hotplug_lock is not enough to
>> resolve the scalability issue?
>
>Depends on which scalability issue :)
>
>Getting rid of / removing the impact of mem_hotplug_lock is certainly a
>very good idea. And improves scalability of all callers of
>get_online_mems(). If that is the intention, very good :)
>

Maybe not exact.

The intention is to get rid of mem_hotplug_begin/done, if I am correct.

>If the intention is to make onlining/offlining more scalable (e.g. in
>parallel or such), then scalability is limited by device_hotplug_lock.
>

I didn't notice this lock.

While this is a step by step improvement.

>
>> 
>> Please correct me, if I am not. :-)
>> 
>
>Guess I was just wondering which scalability issue we are trying to solve :)
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me