On Sat, Oct 12, 2024 at 07:40:13AM +0800, Zijun Hu wrote: > On 2024/10/12 01:46, Dan Williams wrote: > > Zijun Hu wrote: > >> On 2024/10/11 13:34, Dan Williams wrote: > >>> In support of investigating an initialization failure report [1], > >>> cxl_test was updated to register mock memory-devices after the mock > >>> root-port/bus device had been registered. That led to cxl_test crashing > >>> with a use-after-free bug with the following signature: > >>> > >>> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 > >>> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 > >>> cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 > >>> 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 > >>> [..] > >>> cxld_unregister: cxl decoder14.0: > >>> cxl_region_decode_reset: cxl_region region3: > >>> mock_decoder_reset: cxl_port port3: decoder3.0 reset > >>> 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 > >>> cxl_endpoint_decoder_release: cxl decoder14.0: > >>> [..] > >>> cxld_unregister: cxl decoder7.0: > >>> 3) cxl_region_decode_reset: cxl_region region3: > >>> Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI > >>> [..] > >>> RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] > >>> [..] > >>> Call Trace: > >>> <TASK> > >>> cxl_region_decode_reset+0x69/0x190 [cxl_core] > >>> cxl_region_detach+0xe8/0x210 [cxl_core] > >>> cxl_decoder_kill_region+0x27/0x40 [cxl_core] > >>> cxld_unregister+0x5d/0x60 [cxl_core] > >>> > >>> At 1) a region has been established with 2 endpoint decoders (7.0 and > >>> 14.0). Those endpoints share a common switch-decoder in the topology > >>> (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits > >>> the "out of order reset case" in the switch decoder. The effect though > >>> is that region3 cleanup is aborted leaving it in-tact and > >>> referencing decoder14.0. At 3) the second attempt to teardown region3 > >>> trips over the stale decoder14.0 object which has long since been > >>> deleted. > >>> > >>> The fix here is to recognize that the CXL specification places no > >>> mandate on in-order shutdown of switch-decoders, the driver enforces > >>> in-order allocation, and hardware enforces in-order commit. So, rather > >>> than fail and leave objects dangling, always remove them. > >>> > >>> In support of making cxl_region_decode_reset() always succeed, > >>> cxl_region_invalidate_memregion() failures are turned into warnings. > >>> Crashing the kernel is ok there since system integrity is at risk if > >>> caches cannot be managed around physical address mutation events like > >>> CXL region destruction. > >>> > >>> A new device_for_each_child_reverse_from() is added to cleanup > >>> port->commit_end after all dependent decoders have been disabled. In > >>> other words if decoders are allocated 0->1->2 and disabled 1->2->0 then > >>> port->commit_end only decrements from 2 after 2 has been disabled, and > >>> it decrements all the way to zero since 1 was disabled previously. > >>> > >>> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@xxxxxxxxxx [1] > >>> Cc: <stable@xxxxxxxxxxxxxxx> > >>> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > >>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > >>> Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx> > >>> Cc: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx> > >>> Cc: Dave Jiang <dave.jiang@xxxxxxxxx> > >>> Cc: Alison Schofield <alison.schofield@xxxxxxxxx> > >>> Cc: Ira Weiny <ira.weiny@xxxxxxxxx> > >>> Cc: Zijun Hu <zijun_hu@xxxxxxxxxx> > >>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >>> --- > >>> drivers/base/core.c | 35 +++++++++++++++++++++++++++++ > >>> drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++------- > >>> drivers/cxl/core/region.c | 48 +++++++++++----------------------------- > >>> drivers/cxl/cxl.h | 3 ++- > >>> include/linux/device.h | 3 +++ > >>> tools/testing/cxl/test/cxl.c | 14 ++++-------- > >>> 6 files changed, 100 insertions(+), 53 deletions(-) > >>> > >>> diff --git a/drivers/base/core.c b/drivers/base/core.c > >>> index a4c853411a6b..e42f1ad73078 100644 > >>> --- a/drivers/base/core.c > >>> +++ b/drivers/base/core.c > >>> @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, > >>> } > >>> EXPORT_SYMBOL_GPL(device_for_each_child_reverse); > >>> > >>> +/** > >>> + * device_for_each_child_reverse_from - device child iterator in reversed order. > >>> + * @parent: parent struct device. > >>> + * @from: optional starting point in child list > >>> + * @fn: function to be called for each device. > >>> + * @data: data for the callback. > >>> + * > >>> + * Iterate over @parent's child devices, starting at @from, and call @fn > >>> + * for each, passing it @data. This helper is identical to > >>> + * device_for_each_child_reverse() when @from is NULL. > >>> + * > >>> + * @fn is checked each iteration. If it returns anything other than 0, > >>> + * iteration stop and that value is returned to the caller of > >>> + * device_for_each_child_reverse_from(); > >>> + */ > >>> +int device_for_each_child_reverse_from(struct device *parent, > >>> + struct device *from, const void *data, > >>> + int (*fn)(struct device *, const void *)) > >>> +{ > >>> + struct klist_iter i; > >>> + struct device *child; > >>> + int error = 0; > >>> + > >>> + if (!parent->p) > >>> + return 0; > >>> + > >>> + klist_iter_init_node(&parent->p->klist_children, &i, > >>> + (from ? &from->p->knode_parent : NULL)); > >>> + while ((child = prev_device(&i)) && !error) > >>> + error = fn(child, data); > >>> + klist_iter_exit(&i); > >>> + return error; > >>> +} > >>> +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); > >>> + > >> > >> it does NOT deserve, also does NOT need to introduce a new core driver > >> API device_for_each_child_reverse_from(). existing > >> device_for_each_child_reverse() can do what the _from() wants to do. > >> > >> we can use similar approach as below link shown: > >> https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@xxxxxxxxxxx/ > > > > No, just have a simple starting point parameter. I understand that more > > logic can be placed around device_for_each_child_reverse() to achieve > > the same effect, but the core helpers should be removing logic from > > consumers, not forcing them to add more. > > > > If bloat is a concern, then after your const cleanups go through > > device_for_each_child_reverse() can be rewritten in terms of > > device_for_each_child_reverse_from() as (untested): > > > > bloat is one aspect, the other aspect is that there are redundant > between both driver core APIs, namely, there are a question: > > why to still need device_for_each_child_reverse() if it is same as > _from(..., NULL, ...) ? > This same pattern (_reverse and _from_reverse) is present in list.h and other iterators. Why would it be contentious here? Reducing _reverse() to be a wrapper of _from_reverse is a nice way of reducing the bloat/redundancy without having to update every current user - this is a very common refactor pattern. Refactoring without disrupting in-flight work is intrinsically valuable. > > - * > > - * Iterate over @parent's child devices, and call @fn for each, > > - * passing it @data. > > - * > > - * We check the return of @fn each time. If it returns anything > > - * other than 0, we break out and return that value. > > - */ > > -int device_for_each_child_reverse(struct device *parent, void *data, > > - int (*fn)(struct device *dev, void *data)) > > -{ > > - struct klist_iter i; > > - struct device *child; > > - int error = 0; > > - > > - if (!parent || !parent->p) > > - return 0; > > - > > - klist_iter_init(&parent->p->klist_children, &i); > > - while ((child = prev_device(&i)) && !error) > > - error = fn(child, data); > > - klist_iter_exit(&i); > > - return error; > > -} > > -EXPORT_SYMBOL_GPL(device_for_each_child_reverse); > > - > > /** > > * device_for_each_child_reverse_from - device child iterator in reversed order. > > * @parent: parent struct device. > > diff --git a/include/linux/device.h b/include/linux/device.h > > index 667cb6db9019..96a2c072bf5b 100644 > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -1076,11 +1076,14 @@ DEFINE_FREE(device_del, struct device *, if (_T) device_del(_T)) > > > > int device_for_each_child(struct device *dev, void *data, > > int (*fn)(struct device *dev, void *data)); > > -int device_for_each_child_reverse(struct device *dev, void *data, > > - int (*fn)(struct device *dev, void *data)); > > int device_for_each_child_reverse_from(struct device *parent, > > struct device *from, const void *data, > > int (*fn)(struct device *, const void *)); > > +static inline int device_for_each_child_reverse(struct device *dev, const void *data, > > + int (*fn)(struct device *, const void *)) > > +{ > > + return device_for_each_child_reverse_from(dev, NULL, data, fn); > > +} > > struct device *device_find_child(struct device *dev, void *data, > > int (*match)(struct device *dev, void *data)); > > struct device *device_find_child_by_name(struct device *parent, >