Re: ath11k resume fails due to kernel blocks probing MHI virtual devices

"Rafael J. Wysocki" <rafael@xxxxxxxxxx> · Mon, 29 Jan 2024 13:22:27 +0100

On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang@xxxxxxxxxxx> wrote:
>
> Hi Rafael and Pavel,
>
> Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> related with kernel PM framework and MHI module.
>
> Before introducing the issue details, I'd like to summarize how ath11k
> interacts with MHI stack to download WLAN firmware to hardware target:
> 1. when booting/restarting, ath11k powers on MHI module and waits for
> MHI channels to be ready.
> 2. When power on, MHI stack creates some virtual MHI devices, which
> represents MHI hardware channels, and adds them to MHI bus. This
> triggers MHI client driver, named QRTR, to get matched and probe those
> MHI devices. In probe, QRTR initializes MHI channels and finally move
> them to ready state.
> 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> target, then WLAN is working.
>
> Such an flow works well in general, but introduces issues in hibernation
> cycle: when preparing for hibernation, ath11k powers down MHI, this
> results in MHI devices being destroyed thus QRTR resets MHI channels.
> When resuming back from hibernation, ath11k powers on MHI and waits for
> MHI channels to be ready in its resume callback. As said above, MHI
> creates and adds MHI devices to MHI bus, but they can't be probed at
> that time because device probe is prohibited in device_block_probing(),
> finally this results in ath11k resume timeout.
>
> Now there is an potential fix to this issue which would needs changes in
> MHI stack, i.e., don't destroy MHI devices while hibernating.

Exactly.

> And we have had a plenty talk with MHI community regarding this change, see [1]
> and [2].
>
> However Mani (the MHI maintainer) doesn't think it's right to fix it in
> MHI stack. Instead, he thought we might need to add a new PM callback
> which will be called after device probe is unblocked. By registering
> such a callback ath11k can wait the dependency driver, i.e., QRTR, to
> probe and initialize those MHI devices.
>
> Your thoughts?

I'm not quite sure why do the pointless device destruction and
re-creation in the hibernation frlo and add a new callback to the PM
core to work around this.

It doesn't sound like a straightforward approach to me.

> [1] https://lists.infradead.org/pipermail/ath11k/2023-December/005098.html
> [2] https://lists.infradead.org/pipermail/ath11k/2024-January/005205.html