Re: [PATCH v2 01/20] cxl/memdev: Fix endpoint port removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Cameron wrote:
> On Fri, 10 Feb 2023 01:05:27 -0800
> Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> 
> > Testing of ram region support [1], stimulates a long standing bug in
> > cxl_detach_ep() where some cxl_ep_remove() cleanup is skipped due to
> > inability to walk ports after dports have been unregistered. That
> > results in a failure to re-register a memdev after the port is
> > re-enabled leading to a crash like the following:
> > 
> >     cxl_port_setup_targets: cxl region4: cxl_host_bridge.0:port4 iw: 1 ig: 256
> >     general protection fault, ...
> >     [..]
> >     RIP: 0010:cxl_region_setup_targets+0x897/0x9e0 [cxl_core]
> >     dev_name at include/linux/device.h:700
> >     (inlined by) cxl_port_setup_targets at drivers/cxl/core/region.c:1155
> >     (inlined by) cxl_region_setup_targets at drivers/cxl/core/region.c:1249
> >     [..]
> >     Call Trace:
> >      <TASK>
> >      attach_target+0x39a/0x760 [cxl_core]
> >      ? __mutex_unlock_slowpath+0x3a/0x290
> >      cxl_add_to_region+0xb8/0x340 [cxl_core]
> >      ? lockdep_hardirqs_on+0x7d/0x100
> >      discover_region+0x4b/0x80 [cxl_port]
> >      ? __pfx_discover_region+0x10/0x10 [cxl_port]
> >      device_for_each_child+0x58/0x90
> >      cxl_port_probe+0x10e/0x130 [cxl_port]
> >      cxl_bus_probe+0x17/0x50 [cxl_core]
> > 
> > Change the port ancestry walk to be by depth rather than by dport. This
> > ensures that even if a port has unregistered its dports a deferred
> > memdev cleanup will still be able to cleanup the memdev's interest in
> > that port.
> > 
> > The parent_port->dev.driver check is only needed for determining if the
> > bottom up removal beat the top-down removal, but cxl_ep_remove() can
> > always proceed.
> 
> Why can cxl_ep_remove() always proceed?  What stops it racing?
> Is it that we are holding a reference to the port at the time of the
> call so the release callback can't be called until we drop that?

Right, as long as a port reference is held then the cxl_ep_remove() at
cxl_port_release() can not race this one from memdev removal.
The result of cxl_ep_load() is guaranteed to stay stable until
the subsequent put_device().

> Anyhow, good to have a little more detail on the 'why' in the patch
> description (particularly for those reading this when half asleep like me ;)

Long day for you, I appreciate it!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux