On Sat, Jan 11, 2020 at 6:52 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 11.01.20 15:25, David Hildenbrand wrote: > > > > > >> Am 11.01.2020 um 14:56 schrieb Qian Cai <cai@xxxxxx>: > >> > >> > >> > >>> On Jan 11, 2020, at 6:03 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: > >>> > >>> So I just remember why I think this (and the previously reported done > >>> for ACPI DIMMs) are false positives. The actual locking order is > >>> > >>> onlining/offlining from user space: > >>> > >>> kn->count -> device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock > >>> > >>> memory removal: > >>> > >>> device_hotplug_lock -> cpu_hotplug_lock -> mem_hotplug_lock -> kn->count > >>> > >>> > >>> This looks like a locking inversion - but it's not. Whenever we come via > >>> user space we do a mutex_trylock(), which resolves this issue by backing > >>> up. The device_hotplug_lock will prevent > >>> > >>> I have no clue why the device_hotplug_lock does not pop up in the > >>> lockdep report here. Sounds wrong to me. > >>> > >>> I think this is a false positive and not stable material. > >> > >> The point is that there are other paths does kn->count —> cpu_hotplug_lock without needing device_hotplug_lock to race with memory removal. > >> > >> kmem_cache_shrink_all+0x50/0x100 (cpu_hotplug_lock.rw_sem/mem_hotplug_lock.rw_sem) > >> shrink_store+0x34/0x60 > >> slab_attr_store+0x6c/0x170 > >> sysfs_kf_write+0x70/0xb0 > >> kernfs_fop_write+0x11c/0x270 ((kn->count) > >> __vfs_write+0x3c/0x70 > >> vfs_write+0xcc/0x200 > >> ksys_write+0x7c/0x140 > >> system_call+0x5c/0x6 > >> > > > > But not the lock of the memory devices, or am I missing something? > > > > To clarify: > > memory unplug will remove e.g., /sys/devices/system/memory/memoryX/, > which has a dedicated kn->count AFAIK > > If you do a "echo 1 > /sys/kernel/slab/X/shrink", you would not lock the > kn->count of /sys/devices/system/memory/memoryX/, but the one of some > slab thingy. > > The only scenario I could see is if remove_memory_block_devices() will > not only remove /sys/devices/system/memory/memoryX/, but also implicitly > e.g., /sys/kernel/slab/X/. If that is the case, then this is indeed not > a false positive, but something rather hard to trigger (which would > still classify as stable material). Yes, already agreed to drop stable. However, the trylock does not solve the race it just turns the blocking wait to a spin wait, but the subsequent 5ms sleep does make the theoretical race nearly impossible, Thanks for pointing that out. The theoretical race is still a problem because it hides future lockdep violations, but I otherwise can't point to whether the kn->count in question is a false positive concern for an actual deadlock or not. Tracking that down is possible, but not something I have time for at present.