On 6/27/24 22:14, Dmitry Baryshkov wrote: > On Thu, Jun 27, 2024 at 06:49:02PM GMT, Fabrice Gasnier wrote: >> On 6/27/24 17:58, Dmitry Baryshkov wrote: >>> On Thu, 27 Jun 2024 at 18:51, Fabrice Gasnier >>> <fabrice.gasnier@xxxxxxxxxxx> wrote: >>>> >>>> On 6/25/24 18:49, Dmitry Baryshkov wrote: >>>>> On Tue, Jun 25, 2024 at 05:24:54PM GMT, Fabrice Gasnier wrote: >>>>>> On 6/21/24 00:55, Dmitry Baryshkov wrote: >>>>>>> Extract common functions to handle command sending and to handle events >>>>>>> from UCSI. This ensures that all UCSI glue drivers handle the ACKs in >>>>>>> the same way. >>>>>>> >>>>>>> The CCG driver used DEV_CMD_PENDING both for internal >>>>>>> firmware-related commands and for UCSI control handling. Leave the >>>>>>> former use case intact. >>>>>>> >>>>>>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx> >>>>>>> --- >>>>>>> drivers/usb/typec/ucsi/ucsi.c | 43 +++++++++++++++++++++++++++ >>>>>>> drivers/usb/typec/ucsi/ucsi.h | 7 +++++ >>>>>>> drivers/usb/typec/ucsi/ucsi_acpi.c | 46 ++--------------------------- >>>>>>> drivers/usb/typec/ucsi/ucsi_ccg.c | 21 ++----------- >>>>>>> drivers/usb/typec/ucsi/ucsi_glink.c | 47 ++--------------------------- >>>>>>> drivers/usb/typec/ucsi/ucsi_stm32g0.c | 44 ++-------------------------- >>>>>>> drivers/usb/typec/ucsi/ucsi_yoga_c630.c | 52 ++------------------------------- >>>>>>> 7 files changed, 62 insertions(+), 198 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c >>>>>>> index 4ba22323dbf9..691ee0c4ef87 100644 >>>>>>> --- a/drivers/usb/typec/ucsi/ucsi.c >>>>>>> +++ b/drivers/usb/typec/ucsi/ucsi.c >>>>>>> @@ -36,6 +36,48 @@ >>>>>>> */ >>>>>>> #define UCSI_SWAP_TIMEOUT_MS 5000 >>>>>>> >>>>>>> +void ucsi_notify_common(struct ucsi *ucsi, u32 cci) >>>>>>> +{ >>>>>>> + if (UCSI_CCI_CONNECTOR(cci)) >>>>>>> + ucsi_connector_change(ucsi, UCSI_CCI_CONNECTOR(cci)); >>>>>>> + >>>>>>> + if (cci & UCSI_CCI_ACK_COMPLETE && >>>>>>> + test_bit(ACK_PENDING, &ucsi->flags)) >>>>>>> + complete(&ucsi->complete); >>>>>>> + >>>>>>> + if (cci & UCSI_CCI_COMMAND_COMPLETE && >>>>>>> + test_bit(COMMAND_PENDING, &ucsi->flags)) >>>>>>> + complete(&ucsi->complete); >>>>>> >>>>>> Hi Dmitry, >>>>>> >>>>>> I've recently faced some race with ucsi_stm32g0 driver, and have sent a >>>>>> fix for it [1], as you've noticed in the cover letter. >>>>>> >>>>>> To fix that, I've used test_and_clear_bit() in above two cases, instead >>>>>> of test_bit(). >>>>> >>>>> Could you possible describe, why do you need test_and_clear_bit() >>>>> instead of just test_bit()? The bits are cleared at the end of the >>>>> .sync_write(), also there can be no other command (or ACK_CC) submission >>>>> before this one is fully processed. >>>> >>>> Hi Dmitry, >>>> >>>> It took me some time to reproduce this race I observed earlier. >>>> (I observe this during DR swap.) >>>> >>>> Once the ->async_control(UCSI_ACK_CC_CI) call bellow gets completed, and >>>> before the ACK_PENDING bit gets cleared, e.g. clear_bit(ACK_PENDING), I >>>> get an asynchronous interrupt. >>>> >>>> Basically, Then the above complete() gets called (due to >>>> UCSI_CCI_ACK_COMPLETE & ACK_PENDING). >>>> >>>> Subsequent UCSI_GET_CONNECTOR_STATUS command (from >>>> ucsi_handle_connector_change) will be unblocked immediately due to >>>> complete() call has already happen, without UCSI_CCI_COMMAND_COMPLETE >>>> cci flag, hence returning -EIO. >>> >>> But the ACK_CI is being sent as a response to a command. This means >>> that the ppm_lock should be locked. The UCSI_GET_CONNECTOR_STATUS >>> command should wait for ppm_lock to be freed and only then it can >>> proceed with sending the command. Maybe I'm misunderstanding the case >>> or maybe there is a loophole somewhere. >> >> There's probably also some miss-understanding at my end, I don't get how >> the ppm_lock would protext from non atomic async_control()/clear_bit(). >> >> Let me share some trace here, I hope it can better show the difference >> at my end when I get the race. >> I've added some debug prints around these lines, to trace the set/clear >> of the COMMAND/ACK_PENDING, main commands and cci flags. >> >> normal case is: >> --- >> ucsi_send_command_common: UCSI_SET_UOR >> set:COMMAND_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE >> clr:COMMAND_PENDING >> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE >> set:ACK_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE >> clr:ACK_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: ucsi_connector_change >> ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS >> set:COMMAND_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE >> clr:COMMAND_PENDING >> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE >> set:ACK_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE >> clr:ACK_PENDING >> >> racy case: >> --- >> ucsi_send_command_common: UCSI_SET_UOR >> set:COMMAND_PENDING >> ucsi_stm32g0_irq_handler >> ucsi_notify_common: UCSI_CCI_COMMAND_COMPLETE >> clr:COMMAND_PENDING >> ucsi_acknowledge: UCSI_ACK_COMMAND_COMPLETE UCSI_ACK_CONNECTOR_CHANGE >> set:ACK_PENDING >> ucsi_stm32g0_irq_handler <-- up to here nothing supicious >> ucsi_notify_common: ucsi_connector_change >> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE >> ucsi_stm32g0_irq_handler <-- 2nd IRQ before clr:ACK_PENDING >> ucsi_notify_common: ucsi_connector_change >> ucsi_notify_common: UCSI_CCI_ACK_COMPLETE >> clr:ACK_PENDING >> ucsi_send_command_common: UCSI_GET_CONNECTOR_STATUS >> set:COMMAND_PENDING <-- completion already done :-( >> clr:COMMAND_PENDING >> ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5) >> >> I notice the two conditions are met above: both ucsi_connector_change() >> and cci ack completed. >> >> This happens before clear_bit(ACK_PENDING), which lead to anticipate >> completion of the subsequent UCSI_GET_CONNECTOR_STATUS command. >> >> So the test_and_clear_bit() would address a robustness case? > > Ah, I see. So the race is for the completion. I fear that your solution > also doesn't fully solve the problem. The event can arrive after setting > the bit and before sending the command to the hardware. I thought about > doing various tricks, like reinit_completion or something close, but the > issue is that test_and_clear_bit() just makes the race window smaller, > but doesn't eliminate it completely. Hi Dmitry, I've done some tests with reinit_completion(), when entering the ucsi_sync_control_common() routine: + reinit_completion(&ucsi->complete); if (ack) set_bit(ACK_PENDING, &ucsi->flags); else set_bit(COMMAND_PENDING, &ucsi->flags); This indeed avoids the race on the completion at my end. With that, I longer get the error on the subsequent UCSI_GET_CONNECTOR_STATUS command. It looks like a better compromise to me, than test_and_clear_bit() as you pointed out. Best Regards, Fabrice > > It seems the only practical way to handle that is by making sure that > ucsi_notify_common() can sleep and locks the ppm_lock while > processing the CCI. > >> >> Please advise, >> Best Regards, >> Fabrice >> >>> >>>> This is where the test_and_clear_bit() atomic operation helps, to avoid >>>> non atomic operation: >>>> >>>> -> async_control(UCSI_ACK_CC_CI) >>>> new interrupt may occur here >>>> -> clear_bit(ACK_PENDING) >