On 2024/10/11 13:34, Dan Williams wrote: > In support of investigating an initialization failure report [1], > cxl_test was updated to register mock memory-devices after the mock > root-port/bus device had been registered. That led to cxl_test crashing > with a use-after-free bug with the following signature: > > cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 > cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 > cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 > 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 > [..] > cxld_unregister: cxl decoder14.0: > cxl_region_decode_reset: cxl_region region3: > mock_decoder_reset: cxl_port port3: decoder3.0 reset > 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 > cxl_endpoint_decoder_release: cxl decoder14.0: > [..] > cxld_unregister: cxl decoder7.0: > 3) cxl_region_decode_reset: cxl_region region3: > Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI > [..] > RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] > [..] > Call Trace: > <TASK> > cxl_region_decode_reset+0x69/0x190 [cxl_core] > cxl_region_detach+0xe8/0x210 [cxl_core] > cxl_decoder_kill_region+0x27/0x40 [cxl_core] > cxld_unregister+0x5d/0x60 [cxl_core] > > At 1) a region has been established with 2 endpoint decoders (7.0 and > 14.0). Those endpoints share a common switch-decoder in the topology > (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits > the "out of order reset case" in the switch decoder. The effect though > is that region3 cleanup is aborted leaving it in-tact and > referencing decoder14.0. At 3) the second attempt to teardown region3 > trips over the stale decoder14.0 object which has long since been > deleted. > > The fix here is to recognize that the CXL specification places no > mandate on in-order shutdown of switch-decoders, the driver enforces > in-order allocation, and hardware enforces in-order commit. So, rather > than fail and leave objects dangling, always remove them. > > In support of making cxl_region_decode_reset() always succeed, > cxl_region_invalidate_memregion() failures are turned into warnings. > Crashing the kernel is ok there since system integrity is at risk if > caches cannot be managed around physical address mutation events like > CXL region destruction. > > A new device_for_each_child_reverse_from() is added to cleanup > port->commit_end after all dependent decoders have been disabled. In > other words if decoders are allocated 0->1->2 and disabled 1->2->0 then > port->commit_end only decrements from 2 after 2 has been disabled, and > it decrements all the way to zero since 1 was disabled previously. > > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx [1] > Cc: <stable@xxxxxxxxxxxxxxx> > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> > Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > Cc: Dave Jiang <dave.jiang@xxxxxxxxx> > Cc: Alison Schofield <alison.schofield@xxxxxxxxx> > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > Cc: Zijun Hu <zijun_hu@xxxxxxxxxx> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > --- > drivers/base/core.c | 35 +++++++++++++++++++++++++++++ > drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++------- > drivers/cxl/core/region.c | 48 +++++++++++----------------------------- > drivers/cxl/cxl.h | 3 ++- > include/linux/device.h | 3 +++ > tools/testing/cxl/test/cxl.c | 14 ++++-------- > 6 files changed, 100 insertions(+), 53 deletions(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index a4c853411a6b..e42f1ad73078 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, > } > EXPORT_SYMBOL_GPL(device_for_each_child_reverse); > > +/** > + * device_for_each_child_reverse_from - device child iterator in reversed order. > + * @parent: parent struct device. > + * @from: optional starting point in child list > + * @fn: function to be called for each device. > + * @data: data for the callback. > + * > + * Iterate over @parent's child devices, starting at @from, and call @fn > + * for each, passing it @data. This helper is identical to > + * device_for_each_child_reverse() when @from is NULL. > + * > + * @fn is checked each iteration. If it returns anything other than 0, > + * iteration stop and that value is returned to the caller of > + * device_for_each_child_reverse_from(); > + */ > +int device_for_each_child_reverse_from(struct device *parent, > + struct device *from, const void *data, > + int (*fn)(struct device *, const void *)) > +{ > + struct klist_iter i; > + struct device *child; > + int error = 0; > + > + if (!parent->p) > + return 0; > + > + klist_iter_init_node(&parent->p->klist_children, &i, > + (from ? &from->p->knode_parent : NULL)); > + while ((child = prev_device(&i)) && !error) > + error = fn(child, data); > + klist_iter_exit(&i); > + return error; > +} > +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); > + it does NOT deserve, also does NOT need to introduce a new core driver API device_for_each_child_reverse_from(). existing device_for_each_child_reverse() can do what the _from() wants to do. we can use similar approach as below link shown: https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@xxxxxxxxxxx/ > /** > * device_find_child - device iterator for locating a particular device. > * @parent: parent struct device > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c > index 3df10517a327..223c273c0cd1 100644 > --- a/drivers/cxl/core/hdm.c > +++ b/drivers/cxl/core/hdm.c > @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) > return 0; > } > > -static int cxl_decoder_reset(struct cxl_decoder *cxld) > +static int commit_reap(struct device *dev, const void *data) > +{ > + struct cxl_port *port = to_cxl_port(dev->parent); > + struct cxl_decoder *cxld; > + > + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) > + return 0; > + > + cxld = to_cxl_decoder(dev); > + if (port->commit_end == cxld->id && > + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { > + port->commit_end--; > + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", > + dev_name(&cxld->dev), port->commit_end); > + } > + > + return 0; > +} > + > +void cxl_port_commit_reap(struct cxl_decoder *cxld) > +{ > + struct cxl_port *port = to_cxl_port(cxld->dev.parent); > + > + lockdep_assert_held_write(&cxl_region_rwsem); > + > + /* > + * Once the highest committed decoder is disabled, free any other > + * decoders that were pinned allocated by out-of-order release. > + */ > + port->commit_end--; > + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), > + port->commit_end); > + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, > + commit_reap); > +} > +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); > + > +static void cxl_decoder_reset(struct cxl_decoder *cxld) > { > struct cxl_port *port = to_cxl_port(cxld->dev.parent); > struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); > @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) > u32 ctrl; > > if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) > - return 0; > + return; > > - if (port->commit_end != id) { > + if (port->commit_end == id) > + cxl_port_commit_reap(cxld); > + else > dev_dbg(&port->dev, > "%s: out of order reset, expected decoder%d.%d\n", > dev_name(&cxld->dev), port->id, port->commit_end); > - return -EBUSY; > - } > > down_read(&cxl_dpa_rwsem); > ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); > @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) > writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); > up_read(&cxl_dpa_rwsem); > > - port->commit_end--; > cxld->flags &= ~CXL_DECODER_F_ENABLE; > > /* Userspace is now responsible for reconfiguring this decoder */ > @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) > cxled = to_cxl_endpoint_decoder(&cxld->dev); > cxled->state = CXL_DECODER_STATE_MANUAL; > } > - > - return 0; > } > > static int cxl_setup_hdm_decoder_from_dvsec( > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index e701e4b04032..3478d2058303 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) > "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); > return 0; > } else { > - dev_err(&cxlr->dev, > - "Failed to synchronize CPU cache state\n"); > + dev_WARN(&cxlr->dev, > + "Failed to synchronize CPU cache state\n"); > return -ENXIO; > } > } > @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) > return 0; > } > > -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) > { > struct cxl_region_params *p = &cxlr->params; > - int i, rc = 0; > + int i; > > /* > - * Before region teardown attempt to flush, and if the flush > - * fails cancel the region teardown for data consistency > - * concerns > + * Before region teardown attempt to flush, evict any data cached for > + * this region, or scream loudly about missing arch / platform support > + * for CXL teardown. > */ > - rc = cxl_region_invalidate_memregion(cxlr); > - if (rc) > - return rc; > + cxl_region_invalidate_memregion(cxlr); > > for (i = count - 1; i >= 0; i--) { > struct cxl_endpoint_decoder *cxled = p->targets[i]; > @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > if (cxld->reset) > - rc = cxld->reset(cxld); > - if (rc) > - return rc; > + cxld->reset(cxld); > set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); > } > > endpoint_reset: > - rc = cxled->cxld.reset(&cxled->cxld); > - if (rc) > - return rc; > + cxled->cxld.reset(&cxled->cxld); > set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); > } > > /* all decoders associated with this region have been torn down */ > clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); > - > - return 0; > } > > static int commit_decoder(struct cxl_decoder *cxld) > @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, > * still pending. > */ > if (p->state == CXL_CONFIG_RESET_PENDING) { > - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); > - /* > - * Revert to committed since there may still be active > - * decoders associated with this region, or move forward > - * to active to mark the reset successful > - */ > - if (rc) > - p->state = CXL_CONFIG_COMMIT; > - else > - p->state = CXL_CONFIG_ACTIVE; > + cxl_region_decode_reset(cxlr, p->interleave_ways); > + p->state = CXL_CONFIG_ACTIVE; > } > } > > @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) > get_device(&cxlr->dev); > > if (p->state > CXL_CONFIG_ACTIVE) { > - /* > - * TODO: tear down all impacted regions if a device is > - * removed out of order > - */ > - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); > - if (rc) > - goto out; > + cxl_region_decode_reset(cxlr, p->interleave_ways); > p->state = CXL_CONFIG_ACTIVE; > } > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 0d8b810a51f0..5406e3ab3d4a 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -359,7 +359,7 @@ struct cxl_decoder { > struct cxl_region *region; > unsigned long flags; > int (*commit)(struct cxl_decoder *cxld); > - int (*reset)(struct cxl_decoder *cxld); > + void (*reset)(struct cxl_decoder *cxld); > }; > > /* > @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port) > int cxl_num_decoders_committed(struct cxl_port *port); > bool is_cxl_port(const struct device *dev); > struct cxl_port *to_cxl_port(const struct device *dev); > +void cxl_port_commit_reap(struct cxl_decoder *cxld); > struct pci_bus; > int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, > struct pci_bus *bus); > diff --git a/include/linux/device.h b/include/linux/device.h > index b4bde8d22697..667cb6db9019 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data, > int (*fn)(struct device *dev, void *data)); > int device_for_each_child_reverse(struct device *dev, void *data, > int (*fn)(struct device *dev, void *data)); > +int device_for_each_child_reverse_from(struct device *parent, > + struct device *from, const void *data, > + int (*fn)(struct device *, const void *)); > struct device *device_find_child(struct device *dev, void *data, > int (*match)(struct device *dev, void *data)); > struct device *device_find_child_by_name(struct device *parent, > diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c > index 90d5afd52dd0..c5bbd89b3192 100644 > --- a/tools/testing/cxl/test/cxl.c > +++ b/tools/testing/cxl/test/cxl.c > @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld) > return 0; > } > > -static int mock_decoder_reset(struct cxl_decoder *cxld) > +static void mock_decoder_reset(struct cxl_decoder *cxld) > { > struct cxl_port *port = to_cxl_port(cxld->dev.parent); > int id = cxld->id; > > if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) > - return 0; > + return; > > dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); > - if (port->commit_end != id) { > + if (port->commit_end == id) > + cxl_port_commit_reap(cxld); > + else > dev_dbg(&port->dev, > "%s: out of order reset, expected decoder%d.%d\n", > dev_name(&cxld->dev), port->id, port->commit_end); > - return -EBUSY; > - } > - > - port->commit_end--; > cxld->flags &= ~CXL_DECODER_F_ENABLE; > - > - return 0; > } > > static void default_mock_decoder(struct cxl_decoder *cxld) >