Jonathan Cameron wrote: > On Fri, 10 Feb 2023 01:05:27 -0800 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > Testing of ram region support [1], stimulates a long standing bug in > > cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to > > inability to walk ports after dports have been unregistered. That > > results in a failure to re-register a memdev after the port is > > re-enabled leading to a crash like the following: > > > > cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256 > > general protection fault, ... > > [..] > > RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core] > > dev_name at include/linux/device.h:700 > > (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155 > > (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249 > > [..] > > Call Trace: > > <TASK> > > attach_target+0x39a/0x760 [cxl_core] > > ? __mutex_unlock_slowpath+0x3a/0x290 > > cxl_add_to_region+0xb8/0x340 [cxl_core] > > ? lockdep_hardirqs_on+0x7d/0x100 > > discover_region+0x4b/0x80 [cxl_port] > > ? __pfx_discover_region+0x10/0x10 [cxl_port] > > device_for_each_child+0x58/0x90 > > cxl_port_probe+0x10e/0x130 [cxl_port] > > cxl_bus_probe+0x17/0x50 [cxl_core] > > > > Change the port ancestry walk to be by depth rather than by dport. This > > ensures that even if a port has unregistered its dports a deferred > > memdev cleanup will still be able to cleanup the memdev's interest in > > that port. > > > > The parent_port->dev.driver check is only needed for determining if the > > bottom up removal beat the top-down removal, but cxl_ep_remove() can > > always proceed. > > Why can cxl_ep_remove() always proceed? What stops it racing? > Is it that we are holding a reference to the port at the time of the > call so the release callback can't be called until we drop that? Right, as long as a port reference is held then the cxl_ep_remove() at cxl_port_release() can not race this one from memdev removal. The result of cxl_ep_load() is guaranteed to stay stable until the subsequent put_device(). > Anyhow, good to have a little more detail on the 'why' in the patch > description (particularly for those reading this when half asleep like me ;) Long day for you, I appreciate it!