Re: [PATCH 2/3] usb: typec: ucsi: Move unregister out of atomic section

Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx> · Mon, 19 Aug 2024 14:32:53 +0700

On 19 August 2024 10:05:49 GMT+07:00, Bjorn Andersson <quic_bjorande@xxxxxxxxxxx> wrote:
>On Mon, Aug 19, 2024 at 08:16:25AM +0700, Dmitry Baryshkov wrote:
>> On 19 August 2024 06:17:38 GMT+07:00, Bjorn Andersson <quic_bjorande@xxxxxxxxxxx> wrote:
>> >Commit 'caa855189104 ("soc: qcom: pmic_glink: Fix race during
>> >initialization")' moved the pmic_glink client list under a spinlock, as
>> >it is accessed by the rpmsg/glink callback, which in turn is invoked
>> >from IRQ context.
>> >
>> >This means that ucsi_unregister() is now called from IRQ context, which
>> >isn't feasible as it's expecting a sleepable context. An effort is under
>> >way to get GLINK to invoke its callbacks in a sleepable context, but
>> >until then lets schedule the unregistration.
>> >
>> >A side effect of this is that ucsi_unregister() can now happen
>> >after the remote processor, and thereby the communication link with it, is
>> >gone. pmic_glink_send() is amended with a check to avoid the resulting
>> >NULL pointer dereference, but it becomes expecting to see a failing send
>> >upon shutting down the remote processor (e.g. during a restart following
>> >a firmware crash):
>> >
>> >  ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI write request: -5
>> >
>> >Fixes: caa855189104 ("soc: qcom: pmic_glink: Fix race during initialization")
>> >Cc: stable@xxxxxxxxxxxxxxx
>> >Signed-off-by: Bjorn Andersson <quic_bjorande@xxxxxxxxxxx>
>> >---
>> > drivers/soc/qcom/pmic_glink.c       | 10 +++++++++-
>> > drivers/usb/typec/ucsi/ucsi_glink.c | 28 +++++++++++++++++++++++-----
>> > 2 files changed, 32 insertions(+), 6 deletions(-)
>> >
>> >diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
>> >index 58ec91767d79..e4747f1d3da5 100644
>> >--- a/drivers/soc/qcom/pmic_glink.c
>> >+++ b/drivers/soc/qcom/pmic_glink.c
>> >@@ -112,8 +112,16 @@ EXPORT_SYMBOL_GPL(pmic_glink_register_client);
>> > int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
>> > {
>> > 	struct pmic_glink *pg = client->pg;
>> >+	int ret;
>> > 
>> >-	return rpmsg_send(pg->ept, data, len);
>> >+	mutex_lock(&pg->state_lock);
>> >+	if (!pg->ept)
>> >+		ret = -ECONNRESET;
>> >+	else
>> >+		ret = rpmsg_send(pg->ept, data, len);
>> >+	mutex_unlock(&pg->state_lock);
>> >+
>> >+	return ret;
>> > }
>> > EXPORT_SYMBOL_GPL(pmic_glink_send);
>> > 
>> >diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c
>> >index ac53a81c2a81..a33056eec83d 100644
>> >--- a/drivers/usb/typec/ucsi/ucsi_glink.c
>> >+++ b/drivers/usb/typec/ucsi/ucsi_glink.c
>> >@@ -68,6 +68,9 @@ struct pmic_glink_ucsi {
>> > 
>> > 	struct work_struct notify_work;
>> > 	struct work_struct register_work;
>> >+	spinlock_t state_lock;
>> >+	unsigned int pdr_state;
>> >+	unsigned int new_pdr_state;
>> > 
>> > 	u8 read_buf[UCSI_BUF_SIZE];
>> > };
>> >@@ -244,8 +247,22 @@ static void pmic_glink_ucsi_notify(struct work_struct *work)
>> > static void pmic_glink_ucsi_register(struct work_struct *work)
>> > {
>> > 	struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
>> >+	unsigned long flags;
>> >+	unsigned int new_state;
>> >+
>> >+	spin_lock_irqsave(&ucsi->state_lock, flags);
>> >+	new_state = ucsi->new_pdr_state;
>> >+	spin_unlock_irqrestore(&ucsi->state_lock, flags);
>> >+
>> >+	if (ucsi->pdr_state != SERVREG_SERVICE_STATE_UP) {
>> >+		if (new_state == SERVREG_SERVICE_STATE_UP)
>> >+			ucsi_register(ucsi->ucsi);
>> >+	} else {
>> >+		if (new_state == SERVREG_SERVICE_STATE_DOWN)
>> >+			ucsi_unregister(ucsi->ucsi);
>> >+	}
>> > 
>> >-	ucsi_register(ucsi->ucsi);
>> >+	ucsi->pdr_state = new_state;
>> > }
>> 
>> Is there a chance if a race condition if the firmware is restarted quickly, but the system is under heavy mist: 
>> - the driver gets DOWN event, updates the state and schedules the work,
>> - the work starts to execute, reads the state,
>> - the driver gets UP event, updates the state, but the work is not rescheduled as it is still executing 
>> - the worker finishes unregistering the UCSI.
>> 
>
>I was under the impression that if we reach the point where we start
>executing the worker, then a second schedule_work() would cause the
>worker to run again. But I might be mistaken here.

I don't have full source code at hand and the docs only speak about being queued, so it is perfectly possible that I am mistaken here.

>
>What I do expect though is that if we for some reason don't start
>executing the work before the state becomes UP again, the UCSI core
>wouldn't know that the firmware has been reset.
>
>
>My proposal is to accept this risk for v6.11 (and get the benefit of
>things actually working) and then take a new swing at getting rid of all
>these workers for v6.12/13. Does that sound reasonable?

Yes, makes sense to me. 

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>

>
>Regards,
>Bjorn
>
>> 
>> 
>> > 
>> > static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv)
>> >@@ -269,11 +286,12 @@ static void pmic_glink_ucsi_callback(const void *data, size_t len, void *priv)
>> > static void pmic_glink_ucsi_pdr_notify(void *priv, int state)
>> > {
>> > 	struct pmic_glink_ucsi *ucsi = priv;
>> >+	unsigned long flags;
>> > 
>> >-	if (state == SERVREG_SERVICE_STATE_UP)
>> >-		schedule_work(&ucsi->register_work);
>> >-	else if (state == SERVREG_SERVICE_STATE_DOWN)
>> >-		ucsi_unregister(ucsi->ucsi);
>> >+	spin_lock_irqsave(&ucsi->state_lock, flags);
>> >+	ucsi->new_pdr_state = state;
>> >+	spin_unlock_irqrestore(&ucsi->state_lock, flags);
>> >+	schedule_work(&ucsi->register_work);
>> > }
>> > 
>> > static void pmic_glink_ucsi_destroy(void *data)
>> >
>>