On 26.07.19 10:19, Michal Hocko wrote: > On Thu 25-07-19 11:22:06, David Hildenbrand wrote: >> Commit 9852a7212324 ("mm: drop hotplug lock from lru_add_drain_all()") >> states that lru_add_drain_all() "Doesn't need any cpu hotplug locking >> because we do rely on per-cpu kworkers being shut down before our >> page_alloc_cpu_dead callback is executed on the offlined cpu." >> >> And also "Calling this function with cpu hotplug locks held can actually >> lead to obscure indirect dependencies via WQ context.". >> >> Since commit 3f906ba23689 ("mm/memory-hotplug: switch locking to a percpu >> rwsem") we do a cpus_read_lock() in mem_hotplug_begin(). >> >> I don't see how that lock is still helpful, we already hold the >> device_hotplug_lock to protect try_offline_node(), which is AFAIK one >> problematic part that can race with CPU hotplug. If it is still >> necessary, we should document why. > > I have forgot all the juicy details. Maybe Thomas remembers. The > previous recursive home grown locking was just terrible. I do not see > stop_machine being used in the memory hotplug anymore. > > I do support this kind of removal because binding CPU and MEM hotplug > locks is fragile and wrong. But this patch really needs more explanation > on why this is safe. In other words what does cpu_read_lock protects > from in mem hotplug paths. And that is the purpose of marking this RFC, because I am not aware of any :) Hopefully Thomas can clarify if we are missing something important (undocumented) here - if so I'll document it. -- Thanks, David / dhildenb