Re: [PATCH RFC] mm/memory_hotplug: Don't take the cpu_hotplug_lock

David Hildenbrand <david@xxxxxxxxxx> · Fri, 26 Jul 2019 10:22:36 +0200

On 26.07.19 10:19, Michal Hocko wrote:
> On Thu 25-07-19 11:22:06, David Hildenbrand wrote:
>> Commit 9852a7212324 ("mm: drop hotplug lock from lru_add_drain_all()")
>> states that lru_add_drain_all() "Doesn't need any cpu hotplug locking
>> because we do rely on per-cpu kworkers being shut down before our
>> page_alloc_cpu_dead callback is executed on the offlined cpu."
>>
>> And also "Calling this function with cpu hotplug locks held can actually
>> lead to obscure indirect dependencies via WQ context.".
>>
>> Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu
>> rwsem") we do a cpus_read_lock() in mem_hotplug_begin().
>>
>> I don't see how that lock is still helpful, we already hold the
>> device_hotplug_lock to protect try_offline_node(), which is AFAIK one
>> problematic part that can race with CPU hotplug. If it is still
>> necessary, we should document why.
> 
> I have forgot all the juicy details. Maybe Thomas remembers. The
> previous recursive home grown locking was just terrible. I do not see
> stop_machine being used in the memory hotplug anymore.
>  
> I do support this kind of removal because binding CPU and MEM hotplug
> locks is fragile and wrong. But this patch really needs more explanation
> on why this is safe. In other words what does cpu_read_lock protects
> from in mem hotplug paths.

And that is the purpose of marking this RFC, because I am not aware of
any :) Hopefully Thomas can clarify if we are missing something
important (undocumented) here - if so I'll document it.

-- 

Thanks,

David / dhildenb