Zijun Hu wrote: > On 2024/10/11 13:34, Dan Williams wrote: > > In support of investigating an initialization failure report [1], > > cxl_test was updated to register mock memory-devices after the mock > > root-port/bus device had been registered. That led to cxl_test crashing > > with a use-after-free bug with the following signature: > > > > cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 > > cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 > > cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 > > 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 > > [..] > > cxld_unregister: cxl decoder14.0: > > cxl_region_decode_reset: cxl_region region3: > > mock_decoder_reset: cxl_port port3: decoder3.0 reset > > 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 > > cxl_endpoint_decoder_release: cxl decoder14.0: > > [..] > > cxld_unregister: cxl decoder7.0: > > 3) cxl_region_decode_reset: cxl_region region3: > > Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI > > [..] > > RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] > > [..] > > Call Trace: > > <TASK> > > cxl_region_decode_reset+0x69/0x190 [cxl_core] > > cxl_region_detach+0xe8/0x210 [cxl_core] > > cxl_decoder_kill_region+0x27/0x40 [cxl_core] > > cxld_unregister+0x5d/0x60 [cxl_core] > > > > At 1) a region has been established with 2 endpoint decoders (7.0 and > > 14.0). Those endpoints share a common switch-decoder in the topology > > (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits > > the "out of order reset case" in the switch decoder. The effect though > > is that region3 cleanup is aborted leaving it in-tact and > > referencing decoder14.0. At 3) the second attempt to teardown region3 > > trips over the stale decoder14.0 object which has long since been > > deleted. > > > > The fix here is to recognize that the CXL specification places no > > mandate on in-order shutdown of switch-decoders, the driver enforces > > in-order allocation, and hardware enforces in-order commit. So, rather > > than fail and leave objects dangling, always remove them. > > > > In support of making cxl_region_decode_reset() always succeed, > > cxl_region_invalidate_memregion() failures are turned into warnings. > > Crashing the kernel is ok there since system integrity is at risk if > > caches cannot be managed around physical address mutation events like > > CXL region destruction. > > > > A new device_for_each_child_reverse_from() is added to cleanup > > port->commit_end after all dependent decoders have been disabled. In > > other words if decoders are allocated 0->1->2 and disabled 1->2->0 then > > port->commit_end only decrements from 2 after 2 has been disabled, and > > it decrements all the way to zero since 1 was disabled previously. > > > > Link: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx [1] > > Cc: <stable@xxxxxxxxxxxxxxx> > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> > > Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > > Cc: Dave Jiang <dave.jiang@xxxxxxxxx> > > Cc: Alison Schofield <alison.schofield@xxxxxxxxx> > > Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > > Cc: Zijun Hu <zijun_hu@xxxxxxxxxx> > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > --- > > drivers/base/core.c | 35 +++++++++++++++++++++++++++++ > > drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++------- > > drivers/cxl/core/region.c | 48 +++++++++++----------------------------- > > drivers/cxl/cxl.h | 3 ++- > > include/linux/device.h | 3 +++ > > tools/testing/cxl/test/cxl.c | 14 ++++-------- > > 6 files changed, 100 insertions(+), 53 deletions(-) > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index a4c853411a6b..e42f1ad73078 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, > > } > > EXPORT_SYMBOL_GPL(device_for_each_child_reverse); > > > > +/** > > + * device_for_each_child_reverse_from - device child iterator in reversed order. > > + * @parent: parent struct device. > > + * @from: optional starting point in child list > > + * @fn: function to be called for each device. > > + * @data: data for the callback. > > + * > > + * Iterate over @parent's child devices, starting at @from, and call @fn > > + * for each, passing it @data. This helper is identical to > > + * device_for_each_child_reverse() when @from is NULL. > > + * > > + * @fn is checked each iteration. If it returns anything other than 0, > > + * iteration stop and that value is returned to the caller of > > + * device_for_each_child_reverse_from(); > > + */ > > +int device_for_each_child_reverse_from(struct device *parent, > > + struct device *from, const void *data, > > + int (*fn)(struct device *, const void *)) > > +{ > > + struct klist_iter i; > > + struct device *child; > > + int error = 0; > > + > > + if (!parent->p) > > + return 0; > > + > > + klist_iter_init_node(&parent->p->klist_children, &i, > > + (from ? &from->p->knode_parent : NULL)); > > + while ((child = prev_device(&i)) && !error) > > + error = fn(child, data); > > + klist_iter_exit(&i); > > + return error; > > +} > > +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); > > + > > it does NOT deserve, also does NOT need to introduce a new core driver > API device_for_each_child_reverse_from(). existing > device_for_each_child_reverse() can do what the _from() wants to do. > > we can use similar approach as below link shown: > https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@xxxxxxxxxxx/ No, just have a simple starting point parameter. I understand that more logic can be placed around device_for_each_child_reverse() to achieve the same effect, but the core helpers should be removing logic from consumers, not forcing them to add more. If bloat is a concern, then after your const cleanups go through device_for_each_child_reverse() can be rewritten in terms of device_for_each_child_reverse_from() as (untested): diff --git a/drivers/base/core.c b/drivers/base/core.c index e42f1ad73078..2571c910da46 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4007,36 +4007,6 @@ int device_for_each_child(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child); -/** - * device_for_each_child_reverse - device child iterator in reversed order. - * @parent: parent struct device. - * @fn: function to be called for each device. - * @data: data for the callback. - * - * Iterate over @parent's child devices, and call @fn for each, - * passing it @data. - * - * We check the return of @fn each time. If it returns anything - * other than 0, we break out and return that value. - */ -int device_for_each_child_reverse(struct device *parent, void *data, - int (*fn)(struct device *dev, void *data)) -{ - struct klist_iter i; - struct device *child; - int error = 0; - - if (!parent || !parent->p) - return 0; - - klist_iter_init(&parent->p->klist_children, &i); - while ((child = prev_device(&i)) && !error) - error = fn(child, data); - klist_iter_exit(&i); - return error; -} -EXPORT_SYMBOL_GPL(device_for_each_child_reverse); - /** * device_for_each_child_reverse_from - device child iterator in reversed order. * @parent: parent struct device. diff --git a/include/linux/device.h b/include/linux/device.h index 667cb6db9019..96a2c072bf5b 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1076,11 +1076,14 @@ DEFINE_FREE(device_del, struct device *, if (_T) device_del(_T)) int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); -int device_for_each_child_reverse(struct device *dev, void *data, - int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse_from(struct device *parent, struct device *from, const void *data, int (*fn)(struct device *, const void *)); +static inline int device_for_each_child_reverse(struct device *dev, const void *data, + int (*fn)(struct device *, const void *)) +{ + return device_for_each_child_reverse_from(dev, NULL, data, fn); +} struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent,