On 2024/10/12 01:46, Dan Williams wrote: > Zijun Hu wrote: >> On 2024/10/11 13:34, Dan Williams wrote: >>> In support of investigating an initialization failure report [1], >>> cxl_test was updated to register mock memory-devices after the mock >>> root-port/bus device had been registered. That led to cxl_test crashing >>> with a use-after-free bug with the following signature: >>> >>> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 >>> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 >>> cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 >>> 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 >>> [..] >>> cxld_unregister: cxl decoder14.0: >>> cxl_region_decode_reset: cxl_region region3: >>> mock_decoder_reset: cxl_port port3: decoder3.0 reset >>> 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 >>> cxl_endpoint_decoder_release: cxl decoder14.0: >>> [..] >>> cxld_unregister: cxl decoder7.0: >>> 3) cxl_region_decode_reset: cxl_region region3: >>> Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI >>> [..] >>> RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] >>> [..] >>> Call Trace: >>> <TASK> >>> cxl_region_decode_reset+0x69/0x190 [cxl_core] >>> cxl_region_detach+0xe8/0x210 [cxl_core] >>> cxl_decoder_kill_region+0x27/0x40 [cxl_core] >>> cxld_unregister+0x5d/0x60 [cxl_core] >>> >>> At 1) a region has been established with 2 endpoint decoders (7.0 and >>> 14.0). Those endpoints share a common switch-decoder in the topology >>> (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits >>> the "out of order reset case" in the switch decoder. The effect though >>> is that region3 cleanup is aborted leaving it in-tact and >>> referencing decoder14.0. At 3) the second attempt to teardown region3 >>> trips over the stale decoder14.0 object which has long since been >>> deleted. >>> >>> The fix here is to recognize that the CXL specification places no >>> mandate on in-order shutdown of switch-decoders, the driver enforces >>> in-order allocation, and hardware enforces in-order commit. So, rather >>> than fail and leave objects dangling, always remove them. >>> >>> In support of making cxl_region_decode_reset() always succeed, >>> cxl_region_invalidate_memregion() failures are turned into warnings. >>> Crashing the kernel is ok there since system integrity is at risk if >>> caches cannot be managed around physical address mutation events like >>> CXL region destruction. >>> >>> A new device_for_each_child_reverse_from() is added to cleanup >>> port->commit_end after all dependent decoders have been disabled. In >>> other words if decoders are allocated 0->1->2 and disabled 1->2->0 then >>> port->commit_end only decrements from 2 after 2 has been disabled, and >>> it decrements all the way to zero since 1 was disabled previously. >>> >>> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx [1] >>> Cc: <stable@xxxxxxxxxxxxxxx> >>> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> >>> Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> >>> Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> >>> Cc: Dave Jiang <dave.jiang@xxxxxxxxx> >>> Cc: Alison Schofield <alison.schofield@xxxxxxxxx> >>> Cc: Ira Weiny <ira.weiny@xxxxxxxxx> >>> Cc: Zijun Hu <zijun_hu@xxxxxxxxxx> >>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> >>> --- >>> drivers/base/core.c | 35 +++++++++++++++++++++++++++++ >>> drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++------- >>> drivers/cxl/core/region.c | 48 +++++++++++----------------------------- >>> drivers/cxl/cxl.h | 3 ++- >>> include/linux/device.h | 3 +++ >>> tools/testing/cxl/test/cxl.c | 14 ++++-------- >>> 6 files changed, 100 insertions(+), 53 deletions(-) >>> >>> diff --git a/drivers/base/core.c b/drivers/base/core.c >>> index a4c853411a6b..e42f1ad73078 100644 >>> --- a/drivers/base/core.c >>> +++ b/drivers/base/core.c >>> @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, >>> } >>> EXPORT_SYMBOL_GPL(device_for_each_child_reverse); >>> >>> +/** >>> + * device_for_each_child_reverse_from - device child iterator in reversed order. >>> + * @parent: parent struct device. >>> + * @from: optional starting point in child list >>> + * @fn: function to be called for each device. >>> + * @data: data for the callback. >>> + * >>> + * Iterate over @parent's child devices, starting at @from, and call @fn >>> + * for each, passing it @data. This helper is identical to >>> + * device_for_each_child_reverse() when @from is NULL. >>> + * >>> + * @fn is checked each iteration. If it returns anything other than 0, >>> + * iteration stop and that value is returned to the caller of >>> + * device_for_each_child_reverse_from(); >>> + */ >>> +int device_for_each_child_reverse_from(struct device *parent, >>> + struct device *from, const void *data, >>> + int (*fn)(struct device *, const void *)) >>> +{ >>> + struct klist_iter i; >>> + struct device *child; >>> + int error = 0; >>> + >>> + if (!parent->p) >>> + return 0; >>> + >>> + klist_iter_init_node(&parent->p->klist_children, &i, >>> + (from ? &from->p->knode_parent : NULL)); >>> + while ((child = prev_device(&i)) && !error) >>> + error = fn(child, data); >>> + klist_iter_exit(&i); >>> + return error; >>> +} >>> +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); >>> + >> >> it does NOT deserve, also does NOT need to introduce a new core driver >> API device_for_each_child_reverse_from(). existing >> device_for_each_child_reverse() can do what the _from() wants to do. >> >> we can use similar approach as below link shown: >> https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@xxxxxxxxxxx/ > > No, just have a simple starting point parameter. I understand that more > logic can be placed around device_for_each_child_reverse() to achieve > the same effect, but the core helpers should be removing logic from > consumers, not forcing them to add more. > > If bloat is a concern, then after your const cleanups go through > device_for_each_child_reverse() can be rewritten in terms of > device_for_each_child_reverse_from() as (untested): > bloat is one aspect, the other aspect is that there are redundant between both driver core APIs, namely, there are a question: why to still need device_for_each_child_reverse() if it is same as _from(..., NULL, ...) ? So i suggest use existing API now. if there are more users who have such starting point requirement, then add the parameter into device_for_each_child_reverse() which is consistent with other existing *_for_each_*() core APIs such as (class|driver|bus)_for_each_device() and bus_for_each_drv(), that may need much efforts. could you please contains your proposal "fixing this allocation order validation" of below link into this patch series with below reason? and Cc me (^^) https://lore.kernel.org/all/670835f5a2887_964f229474@xxxxxxxxxxxxxxxxxxxxxxxxx.notmuch/ A) the proposal depends on this patch series. B) one of the issues the proposal fix is match_free_decoder() error logic which is also relevant issues this patch series fix, the proposal also can fix the other device_find_child()'s match() issue i care about. C) Actually, it is a bit difficult for me to understand the proposal since i don't have any basic knowledge about CXL. (^^) > diff --git a/drivers/base/core.c b/drivers/base/core.c > index e42f1ad73078..2571c910da46 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -4007,36 +4007,6 @@ int device_for_each_child(struct device *parent, void *data, > } > EXPORT_SYMBOL_GPL(device_for_each_child); > > -/** > - * device_for_each_child_reverse - device child iterator in reversed order. > - * @parent: parent struct device. > - * @fn: function to be called for each device. > - * @data: data for the callback. > - * > - * Iterate over @parent's child devices, and call @fn for each, > - * passing it @data. > - * > - * We check the return of @fn each time. If it returns anything > - * other than 0, we break out and return that value. > - */ > -int device_for_each_child_reverse(struct device *parent, void *data, > - int (*fn)(struct device *dev, void *data)) > -{ > - struct klist_iter i; > - struct device *child; > - int error = 0; > - > - if (!parent || !parent->p) > - return 0; > - > - klist_iter_init(&parent->p->klist_children, &i); > - while ((child = prev_device(&i)) && !error) > - error = fn(child, data); > - klist_iter_exit(&i); > - return error; > -} > -EXPORT_SYMBOL_GPL(device_for_each_child_reverse); > - > /** > * device_for_each_child_reverse_from - device child iterator in reversed order. > * @parent: parent struct device. > diff --git a/include/linux/device.h b/include/linux/device.h > index 667cb6db9019..96a2c072bf5b 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -1076,11 +1076,14 @@ DEFINE_FREE(device_del, struct device *, if (_T) device_del(_T)) > > int device_for_each_child(struct device *dev, void *data, > int (*fn)(struct device *dev, void *data)); > -int device_for_each_child_reverse(struct device *dev, void *data, > - int (*fn)(struct device *dev, void *data)); > int device_for_each_child_reverse_from(struct device *parent, > struct device *from, const void *data, > int (*fn)(struct device *, const void *)); > +static inline int device_for_each_child_reverse(struct device *dev, const void *data, > + int (*fn)(struct device *, const void *)) > +{ > + return device_for_each_child_reverse_from(dev, NULL, data, fn); > +} > struct device *device_find_child(struct device *dev, void *data, > int (*match)(struct device *dev, void *data)); > struct device *device_find_child_by_name(struct device *parent,