Hi, On Wed, Sep 04, 2024 at 09:15:26PM +0200, Christian A. Ehrhardt wrote: > > Hi Heikki, > > On Wed, Sep 04, 2024 at 05:54:29PM +0300, Heikki Krogerus wrote: > > On Wed, Sep 04, 2024 at 03:58:05PM +0200, Christian A. Ehrhardt wrote: > > > > > > Hi Heikki, > > > > > > On Wed, Sep 04, 2024 at 03:07:45PM +0300, Heikki Krogerus wrote: > > > > On Tue, Sep 03, 2024 at 08:19:17PM +0200, Christian A. Ehrhardt wrote: > > > > > If the busy indicator is set, all other fields in CCI should be > > > > > clear according to the spec. However, some UCSI implementations do > > > > > not follow this rule and report bogus data in CCI along with the > > > > > busy indicator. Ignore the contents of CCI if the busy indicator is > > > > > set. > > > > > > > > > > If a command timeout is hit it is possible that the EVENT_PENDING > > > > > bit is cleared while connector work is still scheduled which can > > > > > cause the EVENT_PENDING bit to go out of sync with scheduled connector > > > > > work. Check and set the EVENT_PENDING bit on entry to > > > > > ucsi_handle_connector_change() to fix this. > > > > > > > > > > Reported-by: Anurag Bijea <icaliberdev@xxxxxxxxx> > > > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219108 > > > > > Bisected-by: Christian Heusel <christian@xxxxxxxxx> > > > > > Tested-by: Anurag Bijea <icaliberdev@xxxxxxxxx> > > > > > Fixes: de52aca4d9d5 ("usb: typec: ucsi: Never send a lone connector change ack") > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > Signed-off-by: Christian A. Ehrhardt <lk@xxxxxxx> > > > > > --- > > > > > drivers/usb/typec/ucsi/ucsi.c | 8 ++++++++ > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c > > > > > index 4039851551c1..540cb1d2822c 100644 > > > > > --- a/drivers/usb/typec/ucsi/ucsi.c > > > > > +++ b/drivers/usb/typec/ucsi/ucsi.c > > > > > @@ -38,6 +38,10 @@ > > > > > > > > > > void ucsi_notify_common(struct ucsi *ucsi, u32 cci) > > > > > { > > > > > + /* Ignore bogus data in CCI if busy indicator is set. */ > > > > > + if (cci & UCSI_CCI_BUSY) > > > > > + return; > > > > > > > > I started testing this and it looks like the commands never get > > > > cancelled when the BUSY bit is set. I don't think this patch is the > > > > problem, though. I think the BUSY handling broke earlier, probable in > > > > 5e9c1662a89b ("usb: typec: ucsi: rework command execution functions"). > > > > > > > > I need to look at this a bit more carefully, but in the meantime, can > > > > you try this: > > > > > > > > if (cci & UCSI_CCI_BUSY) { > > > > complete(&ucsi->complete); > > > > return; > > > > } > > > > > > I really don't think this is the correct thing to do and it will > > > likely make things worse. > > > > That was the behaviour before all that command execution refactoring > > this summer. I'm not saying that it's right, but that's how it was. > > The code to do that is still there but does not get called because > the ETIMEDOUT error is checked for CCI in ucsi_run_command. > I guess something like this (only compile tested) would fix it: > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c > index 540cb1d2822c..d6d61606bbcf 100644 > --- a/drivers/usb/typec/ucsi/ucsi.c > +++ b/drivers/usb/typec/ucsi/ucsi.c > @@ -111,15 +111,13 @@ static int ucsi_run_command(struct ucsi *ucsi, u64 command, u32 *cci, > size = clamp(size, 0, 16); > > ret = ucsi->ops->sync_control(ucsi, command); > - if (ret) > - return ret; > - > - ret = ucsi->ops->read_cci(ucsi, cci); > - if (ret) > - return ret; > + if (ucsi->ops->read_cci(ucsi, cci)) > + return -EIO; > > if (*cci & UCSI_CCI_BUSY) > return -EBUSY; > + if (ret) > + return ret; > > if (!(*cci & UCSI_CCI_COMMAND_COMPLETE)) > return -EIO; > Yes, that looks good. > > > A notification with the UCSI_CCI_BUSY bit does _not_ mean that > > > the controller is busy doing other things and cannot complete the > > > command. > > > > > > Instead it is an indication that the controller _is_ working to > > > complete our command but will take somewhat longer: > > > > > > Citing: > > > | Note: If a command takes longer than MIN_TIME_TO_RESPOND_WITH_BUSY ms > > > | for the PPM (excluding PPM to OPM communication latency) to complete, > > > | then the PPM shall respond to the command by setting the CCI Busy > > > | Indicator and notify the OPM. > > > | Subsequently, when the PPM actually completes the command, the > > > | PPM shall notify the OPM of the outcome of the command via an > > > | asynchronous notification associated with that command. > > > > > > Unless I misunderstand what you are trying to do your change would > > > cause us to needlessly abort/cancel every command that takes more than > > > MIN_TIME_TO_RESPOND_WITH_BUSY to complete. > > > > > > What am I missing? > > > > The decision to Cancel was made to work around buggy EC firmwares that > > reported BUSY, and then never completed the command. So without that > > Cancel hack, the PPM was stuck on those systems. > > Yes fine. But the cancel should be done _after_ the command times > out normally, I guess. Otherwise conforming systems will get there > commands terminated/aborted for no good reason. And this is what > the current code tries to do. > > > I don't know what we should do about that hack. We probable could just > > ignore those old systems, and then add quirks for them as needed. But > > I also don't really like what you are proposing in this patch, that we > > basically ignore the BUSY bit completely. > > See above. I think that solves both cases nicely. Agreed. Can you incorporate that into this patch? > > Right now I was hoping that we return the behaviour of the driver to > > a point where everything worked as before, and after that start > > improving the driver. That's why I was hoping to hear does the problem > > that you are seeing go away with that approach. > > > > With which command do you guys get the busy notification? > > It happens for all types of commands. I will append debug output where > all commands sent and all CCI values read are printed. > > Unfortunately, I don't have direct access to the affected hardware. > I'm just looking into this because one of my changes from earlier > this year caused a regression on that machine. Is this sufficient to > show what's going on? Yes it's fine. I was mostly interested. > > In any case, I don't think all those ucsi_*_common() functions give us > > enough room to move here. I feel that the command execution needs to > > be refactored somehow again. > > That's your call to make but personally, I like the recent changes > to the interface between ucsi.c and the backend drivers. Just to clarify here, I did no have anything that drastic in mind. Thanks Christian, -- heikki