On Tue, Oct 08, 2024 at 02:06:17PM +0100, Sudeep Holla wrote: > Hi Florian, > > Thanks for the detailed explanation. > > On Mon, Oct 07, 2024 at 10:07:46AM -0700, Florian Fainelli wrote: > > Hi Cristian, > > > > On October 7, 2024 4:52:33 AM PDT, Cristian Marussi > > <cristian.marussi@xxxxxxx> wrote: > > > On Sat, Oct 05, 2024 at 09:33:17PM -0700, Florian Fainelli wrote: > > > > Broadcom STB platforms have for historical reasons included both > > > > "arm,scmi-smc" and "arm,scmi" in their SCMI Device Tree node compatible > > > > string. > > > > > > Hi Florian, > > > > > > did not know this.. > > > > It stems from us starting with a mailbox driver that did the SMC call, and > > later transitioning to the "smc" transport proper. Our boot loader provides > > the Device Tree blob to the kernel and we maintain backward/forward > > compatibility as much as possible. > > > > IIUC, you need to support old kernel with SMC mailbox driver and new SMC > transport within the SCMI. Is that right understanding ? > > > > > > > > > > > > After the commit cited in the Fixes tag and with a kernel > > > > configuration that enables both the SCMI and the Mailbox transports, we > > > > would probe the mailbox transport, but fail to complete since we would > > > > not have a mailbox driver available. > > > > > > > Not sure to have understood this... > > > > > > ...you mean you DO have the SMC/Mailbox SCMI transport drivers compiled > > > into the Kconfig AND you have BOTH the SMC AND Mailbox compatibles in > > > DT, BUT your platform does NOT physically have a mbox/shmem transport > > > and as a consequence, when MBOX probes (at first), you see an error from > > > the core like: > > > > > > "arm-scmi: unable to communicate with SCMI" > > > > > > since it gets no reply from the SCMI server (being not connnected via > > > mbox) and it bails out .... am I right ? > > > > In an unmodified kernel where both the "mailbox" and "smc" transports are > > enabled, we get the "mailbox" driver to probe first since it matched the > > "arm,scmi" part of the compatible string and it is linked first into the > > kernel. Down the road though we will fail the initialization with: > > > > [ 1.135363] arm-scmi arm-scmi.1.auto: Using scmi_mailbox_transport > > [ 1.141901] arm-scmi arm-scmi.1.auto: SCMI max-rx-timeout: 30ms > > [ 1.148113] arm-scmi arm-scmi.1.auto: failed to setup channel for > > protocol:0x10 > > IIUC, the DTB has mailbox nodes that are available but fail only in the setup > stage ? Or is it marked unavailable and we are missing some checks either > in SCMI or mailbox ? > > IOW, have you already explored that this -EINVAL is correct return value > here and can't be changed to -ENODEV ? I might be not following the failure > path correctly here, but I assume it is > scmi_chan_setup() > info->desc->ops->chan_setup() > mailbox_chan_setup() > mbox_request_channel() > > > [ 1.155828] arm-scmi arm-scmi.1.auto: error -EINVAL: failed to setup > > channels > > [ 1.163379] arm-scmi arm-scmi.1.auto: probe with driver arm-scmi failed > > with error -22 > > > > Because the platform device is now bound, and there is no mechanism to > > return -ENODEV, we won't try another transport driver that would attempt to > > match the other compatibility strings. That makes sense because in general > > you specify the Device Tree precisely, and you also have a tailored kernel > > configuration. Right now this is only an issue using arm's > > multi_v7_defconfig and arm64's defconfig both of which that we intend to > > keep on using for CI purposes. > > > > > > > > > > If this is the case, without this patch, after this error and the mbox probe > > > failing, the SMC transport, instead, DO probe successfully at the end, right ? > > > > With my patch we probe the "smc" transport first and foremost and we > > successfully initialize it, therefore we do not even try the "mailbox" > > transport at all, which is intended. > > > > > > > > IOW, what is the impact without this patch, an error and a delay in the > > > probe sequence till it gets to the SMC transport probe 9as second > > > attempt) or worse ? (trying to understand here...) > > > > There is no recovery without the patch, we are not giving up the arm_scmi > > platform device because there is no mechanism to return -ENODEV and allow > > any of the subsequent transport drivers enabled to attempt to take over the > > platform device and probe it again. > > > > OK this sounds like you have already explored returning -ENODEV is not > an option ? It is fair enough, but just want to understand correctly. > I still think I am missing something. Having a quick look at dd.c it seems to me that the probe error from the first matched driver->probe is propagated back to the callchain (and the driver that fails the probe in any way is NOT bound at that point) till driver_probe_device() THEN, on one side, in __driver_attach() then the retval is ignored: dd.c::__driver_attach() /* * Lock device and try to bind to it. We drop the error * here and always return 0, because we need to keep trying * to bind to devices and some drivers will return an error * simply if it didn't support the device. * * driver_probe_device() will spit a warning if there * is an error. ...while, on the other side, looking at __device_attach_driver() it DOES report the error from driver_probe_device() BUT the __device_attach_driver() routine is called by bus_for_eachdrv() inside __device_attach() and DOES cause such loop (bus_for_each_drv() to bail out with an error...BUT, again, no more driver match/probe is attempted and I suppose that if you restart somehow such sequence you will endup again failing at the same point on the same first-match driver... So seems a sort of structural issue...also because indeed you have something that is somehow a malformed DT so the device_match succeeds for good reasons... I may have miss a lot more, though :D Thanks, Cristian