On Tue, Nov 14, 2023 at 07:22:33PM +0100, David Hildenbrand wrote: > On 14.11.23 19:02, Sumanth Korikkar wrote: > > The patch subject talks about "fixing locking order", but it's actually > missing locking, no? > > > From Documentation/core-api/memory-hotplug.rst: > > When adding/removing/onlining/offlining memory or adding/removing > > heterogeneous/device memory, we should always hold the mem_hotplug_lock > > in write mode to serialise memory hotplug (e.g. access to global/zone > > variables). > > > > mhp_(de)init_memmap_on_memory() functions can change zone stats and > > struct page content, but they are currently called w/o the > > mem_hotplug_lock. > > > > When memory block is being offlined and when kmemleak goes through each > > populated zone, the following theoretical race conditions could occur: > > CPU 0: | CPU 1: > > memory_offline() | > > -> offline_pages() | > > -> mem_hotplug_begin() | > > ... | > > -> mem_hotplug_done() | > > | kmemleak_scan() > > | -> get_online_mems() > > | ... > > -> mhp_deinit_memmap_on_memory() | > > [not protected by mem_hotplug_begin/done()]| > > Marks memory section as offline, | Retrieves zone_start_pfn > > poisons vmemmap struct pages and updates | and struct page members. > > the zone related data | > > | ... > > | -> put_online_mems() > > > > Fix this by ensuring mem_hotplug_lock is taken before performing > > mhp_init_memmap_on_memory(). Also ensure that > > mhp_deinit_memmap_on_memory() holds the lock. > > What speaks against grabbing that lock in these functions? > At present, the functions online_pages() and offline_pages() acquire the mem_hotplug_lock right at the start. However, given the necessity of locking in mhp_(de)init_memmap_on_memory(), it would be more efficient to consolidate the locking process by holding the mem_hotplug_lock once in memory_block_online() and memory_block_offline(). Moreover, the introduction of the 'memmap on memory' feature on s390 brings a new physical memory notifier, and functions like __add_pages() or arch_add_memory() are consistently invoked with the mem_hotplug_lock already acquired. Considering these factors, it seemed more natural to move mem_hotplug_lock in memory_block_online() and memory_block_offline(), which was described as "fixing locking order" in the subject. I will change the subject to "add missing locking", if it is misleading . Would you or Oscar agree that there is a need for those mhp_(de)init_memmap_on_memory() functions to take lock at all? Thanks