Re: [PATCH RFC v2] soc: qcom: pmic_glink: Fix device access from worker during suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25-01-30 10:43:26, Johan Hovold wrote:
> On Wed, Jan 29, 2025 at 01:46:15PM +0200, Abel Vesa wrote:
> > For historical reasons, the GLINK smem interrupt is registered with
> 
> Please spell out these reasons so that we can determine if they are
> still valid or not.

Sure.

Bjorn, Caleb, please correct me if I'm wrong, but IIRC the historical
reasons here have to do with notifications from mdsp and sdsp being
needed on mobile platforms. Was there something else?

> 
> > IRRQF_NO_SUSPEND flag set, which is the underlying problem here, since the
> > incoming messages can be delivered during late suspend and early
> > resume.
> > 
> > In this specific case, the pmic_glink_altmode_worker() currently gets
> > scheduled on the system_wq which can be scheduled to run while devices
> > are still suspended. This proves to be a problem when a Type-C retimer,
> > switch or mux that is controlled over a bus like I2C, because the I2C
> > controller is suspended.
> 
> This is not just an issue with an i2c retimer, this is a plain generic
> bug in that the glink code which is calling drivers while their devices
> are suspended.

Yes, that's true. I merely used the i2c scenario since this is the first real
usecase where this proves to be a problem. I do agree that this is a
problem in the glink code design though and needs to be reworked
properly.

> 
> > This has been proven to be the case on the X Elite boards where such
> > retimers (ParadeTech PS8830) are used in order to handle Type-C
> > orientation and altmode configuration. The following warning is thrown:
> 
> > The proper solution here should be to not deliver these kind of messages
> > during system suspend at all, or at least make it configurable per glink
> > client. But simply dropping the IRQF_NO_SUSPEND flag entirely will break
> > other clients.
> 
> Which clients? Do we care enough about them to be papering over this
> very real bug which can potentially crash the system (e.g. due to
> unclocked register accesses)?

Well, you just pointed out the ucsi below, which I totally missed.

I guess this fix would have to be replicated to ucsi as well in this
case. But then it is not just a one-off anymore, like Bjorn suggested.

Since this hasn't proved to be a problem until now, I guess all the
other clients are fine with scheduling work while suspended. But this
might be a wrong assumption as well.

> 
> > The final shape of the rework of the pmic glink driver in
> > order to fulfill both the filtering of the messages that need to be able
> > to wake-up the system and the queueing of these messages until the system
> > has properly resumed is still being discussed and it is planned as a
> > future effort.
> > 
> > Meanwhile, the stop-gap fix here is to schedule the pmic glink altmode
> > worker on the system_freezable_wq instead of the system_wq. This will
> > result in the altmode worker not being scheduled to run until the
> > devices are resumed first, which will give the controllers like I2C a
> > chance to resume before the transfer is requested.
> 
> This is incomplete at best, even as a stop gap. I just threw a
> dump_stack() in qmp_combo_typec_switch_set() and see that it is being
> called four times (don't ask me why) from two different workers on
> hotplug.

Yes, that is expected. From the pmic glink altmode worker and then the
ucsi worker (for the gpio based orientation).

> 
> Even if you use the freezable workqueue for altmode notifications, you
> can still end up here via the ucsi notifications, and that now also
> seems to generate a warning in the PM code during resume.
> 
> [   25.684039] Call trace:
> [   25.686633]  show_stack+0x18/0x24 (C)
> [   25.690492]  dump_stack_lvl+0xc0/0xd0
> [   25.694350]  dump_stack+0x18/0x24
> [   25.697851]  qmp_combo_typec_switch_set+0x28/0x210 [phy_qcom_qmp_combo]
> [   25.704764]  typec_switch_set+0x58/0x90 [typec]
> [   25.709525]  ps883x_sw_set+0x2c/0x98 [ps883x]
> [   25.714092]  typec_switch_set+0x58/0x90 [typec]
> [   25.718861]  typec_set_orientation+0x24/0x6c [typec]
> [   25.724068]  pmic_glink_ucsi_connector_status+0x5c/0x88 [ucsi_glink]
> [   25.730726]  ucsi_handle_connector_change+0xc4/0x448 [typec_ucsi]
> [   25.737099]  process_one_work+0x20c/0x610
> [   25.741322]  worker_thread+0x23c/0x378
> [   25.745274]  kthread+0x124/0x128
> [   25.748680]  ret_from_fork+0x10/0x20
> [   25.753631] typec port4-partner: PM: parent port4 should not be sleeping

Yes, totally missed this one. I guess in the end we could apply the same
fix, considering the alternative of reworking the glink code now.

Or I can just start reworking the glink code now but that is a broader
effort since we will be trying to fix the unconditionally wake-up glink
interrupts as well in one go. And will impact all platforms ...

> 
> When you first brought up the possible workaround of using a freezable
> workqueue, I mistakenly thought this would apply to all glink messages
> (I realise that would defeat the purpose of allowing some early
> notifications).

No, only the altmode worker. In fact, everything stays the same with
respect to the other glink messages as we don't want to break other
clients (hopefully Caleb and Bjorn can describe better the ones that
they know about, IIRC, mobile platforms related).

That was the whole point of doing the altmode only (for now).

But now I'm not sure this is a good idea anymore, since we are starting
to play wack-a-mole if we add ucsi to the list.

> 
> You've also found that the glink interrupts are waking up the system
> unconditionally on battery notifications, which would be yet another
> reason for disabling the interrupts (or implementing some kind of
> masking).

Yes, that is something that needs to be addressed. But IIUC it involves
telling adsp to keep quiet about some notifications, which I don't know
if it's possible and how should be done.

For example, with some chargers, I do get "unknown notification: 0x283"
from the pmic_glink.power-supply and it wakes up from suspend even.

> 
> I'm leaning towards just disabling the glink interrupts during suspend,
> but that depends a bit on what (historical?) functionality actually need
> them.

So the disabling needs to be at least per platform if not per client (or
even notification type). This is what the proper solution discussed with
Bjorn off-list was, IIRC.

> 
> Johan

Thanks for reviewing and reporting this new ucsi case.

Abel




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux