On Mon, Nov 26, 2018 at 11:17:40PM -0800, Dave Hansen wrote: >On 11/26/18 10:25 PM, Michal Hocko wrote: >> [Cc Dave who has added the lock into this path. Maybe he remembers why] > >I don't remember specifically. But, the pattern of: > > allocate > lock > set > unlock > >is _usually_ so we don't have two "sets" racing with each other. In >this case, that would have been to ensure that two >sparse_init_one_section()'s didn't race and leak one of the two >allocated memmaps or worse. > >I think mem_hotplug_lock protects this case these days, though. I don't >think we had it in the early days and were just slumming it with the >pgdat locks. > >I really don't like the idea of removing the lock by just saying it >doesn't protect anything without doing some homework first, though. It >would actually be really nice to comment the entire call chain from the >mem_hotplug_lock acquisition to here. There is precious little >commenting in there and it could use some love. Dave, Thanks for your comment :-) I should put more words to the reason for removing the lock. Here is a simplified call trace for sparse_add_one_section() during physical add/remove phase. __add_memory() add_memory_resource() mem_hotplug_begin() arch_add_memory() add_pages() __add_pages() __add_section() sparse_add_one_section(pfn) mem_hotplug_done() When we just look at the sparse section initialization, we can see the contention happens when __add_memory() try to add a same range or range overlapped in SECTIONS_PER_ROOT number of sections. Otherwise, they won't access the same memory. If this happens, we may face two contentions: * reallocation of mem_section[root] * reallocation of memmap and usemap While neither of them could be protected by the pgdat_resize_lock from my understanding. Grab pgdat_resize_lock just slow down the process, while finally they will replace the mem_section[root] and ms->section_mem_map with their own new allocated data. Last bu not the least, to be honest, even the global mem_hotplug_lock doesn't help in this situation. In case __add_memory() try to add the same range twice, the sparse section would be initialized twice. Which means it will be overwritten with the new allocated memmap/usermap. But maybe we have the assumption this reentrance will not happen. This is all what I understand, in case there is some misunderstanding, please let me know. -- Wei Yang Help you, Help me