On Thu, Jun 27, 2024 at 06:49:02PM GMT, Fabrice Gasnier wrote: > On 6/27/24 17:58, Dmitry Baryshkov wrote: > > On Thu, 27 Jun 2024 at 18:51, Fabrice Gasnier > > <fabrice.gasnier@xxxxxxxxxxx> wrote: > >> > >> On 6/25/24 18:49, Dmitry Baryshkov wrote: > >>> On Tue, Jun 25, 2024 at 05:24:54PM GMT, Fabrice Gasnier wrote: > >>>> On 6/21/24 00:55, Dmitry Baryshkov wrote: > >>>>> Extract common functions to handle command sending and to handle events > >>>>> from UCSI. This ensures that all UCSI glue drivers handle the ACKs in > >>>>> the same way. > >>>>> > >>>>> The CCG driver used DEV_CMD_PENDING both for internal > >>>>> firmware-related commands and for UCSI control handling. Leave the > >>>>> former use case intact. > >>>>> > >>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx> > >>>>> --- > >>>>> drivers/usb/typec/ucsi/ucsi.c | 43 +++++++++++++++++++++++++++ > >>>>> drivers/usb/typec/ucsi/ucsi.h | 7 +++++ > >>>>> drivers/usb/typec/ucsi/ucsi_acpi.c | 46 ++--------------------------- > >>>>> drivers/usb/typec/ucsi/ucsi_ccg.c | 21 ++----------- > >>>>> drivers/usb/typec/ucsi/ucsi_glink.c | 47 ++--------------------------- > >>>>> drivers/usb/typec/ucsi/ucsi_stm32g0.c | 44 ++-------------------------- > >>>>> drivers/usb/typec/ucsi/ucsi_yoga_c630.c | 52 ++------------------------------- > >>>>> 7 files changed, 62 insertions(+), 198 deletions(-) > >>>>> > >>>>> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c > >>>>> index 4ba22323dbf9..691ee0c4ef87 100644 > >>>>> --- a/drivers/usb/typec/ucsi/ucsi.c > >>>>> +++ b/drivers/usb/typec/ucsi/ucsi.c > >>>>> @@ -36,6 +36,48 @@ > >>>>> */ > >>>>> #define UCSI_SWAP_TIMEOUT_MS 5000 > >>>>> > >>>>> +void ucsi_notify_common(struct ucsi *ucsi, u32 cci) > >>>>> +{ > >>>>> + if (UCSI_CCI_CONNECTOR(cci)) > >>>>> + ucsi_connector_change(ucsi, UCSI_CCI_CONNECTOR(cci)); > >>>>> + > >>>>> + if (cci & UCSI_CCI_ACK_COMPLETE && > >>>>> + test_bit(ACK_PENDING, &ucsi->flags)) > >>>>> + complete(&ucsi->complete); > >>>>> + > >>>>> + if (cci & UCSI_CCI_COMMAND_COMPLETE && > >>>>> + test_bit(COMMAND_PENDING, &ucsi->flags)) > >>>>> + complete(&ucsi->complete); > >>>> > >>>> Hi Dmitry, > >>>> > >>>> I've recently faced some race with ucsi_stm32g0 driver, and have sent a > >>>> fix for it [1], as you've noticed in the cover letter. > >>>> > >>>> To fix that, I've used test_and_clear_bit() in above two cases, instead > >>>> of test_bit(). > >>> > >>> Could you possible describe, why do you need test_and_clear_bit() > >>> instead of just test_bit()? The bits are cleared at the end of the > >>> .sync_write(), also there can be no other command (or ACK_CC) submission > >>> before this one is fully processed. > >> > >> Hi Dmitry, > >> > >> It took me some time to reproduce this race I observed earlier. > >> (I observe this during DR swap.) > >> > >> Once the ->async_control(UCSI_ACK_CC_CI) call bellow gets completed, and > >> before the ACK_PENDING bit gets cleared, e.g. clear_bit(ACK_PENDING), I > >> get an asynchronous interrupt. > >> > >> Basically, Then the above complete() gets called (due to > >> UCSI_CCI_ACK_COMPLETE & ACK_PENDING). > >> > >> Subsequent UCSI_GET_CONNECTOR_STATUS command (from > >> ucsi_handle_connector_change) will be unblocked immediately due to > >> complete() call has already happen, without UCSI_CCI_COMMAND_COMPLETE > >> cci flag, hence returning -EIO. > > > > But the ACK_CI is being sent as a response to a command. This means > > that the ppm_lock should be locked. The UCSI_GET_CONNECTOR_STATUS > > command should wait for ppm_lock to be freed and only then it can > > proceed with sending the command. Maybe I'm misunderstanding the case > > or maybe there is a loophole somewhere. > > There's probably also some miss-understanding at my end, I don't get how > the ppm_lock would protext from non atomic async_control()/clear_bit(). > > Let me share some trace here, I hope it can better show the difference > at my end when I get the race. > I've added some debug prints around these lines, to trace the set/clear > of the COMMAND/ACK_PENDING, main commands and cci flags. > > normal case is: > --- > ucsi_send_command_common: UCSI_SET_UOR > set:COMMAND_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE > clr:COMMAND_PENDING > ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE > set:ACK_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: UCSI_CCI_ACK_COMPLETE > clr:ACK_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: ucsi_connector_change > ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS > set:COMMAND_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE > clr:COMMAND_PENDING > ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE > set:ACK_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: UCSI_CCI_ACK_COMPLETE > clr:ACK_PENDING > > racy case: > --- > ucsi_send_command_common: UCSI_SET_UOR > set:COMMAND_PENDING > ucsi_stm32g0_irq_handler > ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE > clr:COMMAND_PENDING > ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE > set:ACK_PENDING > ucsi_stm32g0_irq_handler <-- up to here nothing supicious > ucsi_notify_common: ucsi_connector_change > ucsi_notify_common: UCSI_CCI_ACK_COMPLETE > ucsi_stm32g0_irq_handler <-- 2nd IRQ before clr:ACK_PENDING > ucsi_notify_common: ucsi_connector_change > ucsi_notify_common: UCSI_CCI_ACK_COMPLETE > clr:ACK_PENDING > ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS > set:COMMAND_PENDING <-- completion already done :-( > clr:COMMAND_PENDING > ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5) > > I notice the two conditions are met above: both ucsi_connector_change() > and cci ack completed. > > This happens before clear_bit(ACK_PENDING), which lead to anticipate > completion of the subsequent UCSI_GET_CONNECTOR_STATUS command. > > So the test_and_clear_bit() would address a robustness case? Ah, I see. So the race is for the completion. I fear that your solution also doesn't fully solve the problem. The event can arrive after setting the bit and before sending the command to the hardware. I thought about doing various tricks, like reinit_completion or something close, but the issue is that test_and_clear_bit() just makes the race window smaller, but doesn't eliminate it completely. It seems the only practical way to handle that is by making sure that ucsi_notify_common() can sleep and locks the ppm_lock while processing the CCI. > > Please advise, > Best Regards, > Fabrice > > > > >> This is where the test_and_clear_bit() atomic operation helps, to avoid > >> non atomic operation: > >> > >> -> async_control(UCSI_ACK_CC_CI) > >> new interrupt may occur here > >> -> clear_bit(ACK_PENDING) -- With best wishes Dmitry