Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Mani,

On Thu, Nov 18, 2021 at 6:57 AM Manivannan Sadhasivam
<manivannan.sadhasivam@xxxxxxxxxx> wrote:
>
> Some devices tend to trigger SYS_ERR interrupt while the host handling
> SYS_ERR state of the device during power up. This creates a race
> condition and causes a failure in booting up the device.
>
> The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> handling in mhi_async_power_up(). Once the host detects that the device
> is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> process the reset request. During this time, the device triggers SYS_ERR
> interrupt to the host and host starts handling SYS_ERR execution.
>
> So by the time the device has completed reset, host starts SYS_ERR
> handling. This causes the race condition and the modem fails to boot.
>
> Hence, register the IRQ handler only after handling the SYS_ERR check
> to avoid getting spurious IRQs from the device.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> Reported-by: Aleksander Morgado <aleksander@xxxxxxxxxxxxx>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx>
> ---
>
> Changes in v3:
>
> * Moved BHI_INTVEC setup after irq setup
> * Used interval_us as the delay for the polling API
>
> Changes in v2:
>
> * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
>

I tried this v3 patch and I'm not sure if it's working properly in my
setup; not all boots are successfully bringing the modem up.

Once I installed it, I kept having this kind of logs on every boot:
[    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.052191] mhi mhi0: Requested to power ON
[    7.168042] mhi mhi0: Power on setup success
[    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
[   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16

I didn't have debug logs enabled in that build, so I created a new one
with them enabled, and... these are the logs I'm getting now (not the
same ones as above, not sure why):

// Cold boots fail (tried several times)
[    7.033370] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x
[    7.040558] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.049105] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.055923] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.062130] mhi mhi0: Requested to power ON
[    7.066351] mhi mhi0: Attempting power on with EE: PASS THROUGH,
state: SYS ERROR
[   20.572010] mhi mhi0: Power on setup success
[   20.576412] mhi mhi0: Handling state transition: PBL
[   55.139820] mhi mhi0: Device failed to enter MHI Ready
[   55.144978] mhi mhi0: MHI did not enter READY state
[   55.149876] mhi mhi0: Handling state transition: DISABLE
[   55.155203] mhi mhi0: Processing disable transition with PM state:
Firmware Download Error
[   55.163482] mhi mhi0: Triggering MHI Reset in device
[   89.727820] mhi mhi0: Device failed to clear MHI Reset
[   89.732975] mhi mhi0: Waiting for all pending event ring processing
to complete
[   89.740316] mhi mhi0: Waiting for all pending threads to complete
[   89.746422] mhi mhi0: Reset all active channels and remove MHI devices
[   89.752963] mhi mhi0: Resetting EV CTXT and CMD CTXT
[   89.757940] mhi mhi0: Error moving from PM state: Firmware Download
Error to: DISABLE
[   89.765785] mhi mhi0: Exiting with PM state: Firmware Download
Error, MHI state: RESET
[   89.773793] mhi-pci-generic 0000:01:00.0: failed to power up MHI controller
[   89.781067] mhi-pci-generic: probe of 0000:01:00.0 failed with error -110

// Warm reboots afterwards seem to go ok (tried several times)
[    7.072762] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x
[    7.079942] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.088505] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.095331] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.101628] mhi mhi0: Requested to power ON
[    7.105859] mhi mhi0: Attempting power on with EE: PASS THROUGH,
state: SYS ERROR
[    7.224053] mhi mhi0: Power on setup success
[    7.228373] mhi mhi0: local ee: INVALID_EE state: RESET device ee:
MISSION MODE state: READY
[    7.236878] mhi mhi0: Handling state transition: PBL
[    7.241871] mhi mhi0: Device in READY State
[    7.246118] mhi mhi0: Initializing MHI registers
[    7.250775] mhi mhi0: Wait for device to enter SBL or Mission mode
[   15.418147] mhi mhi0: local ee: MISSION MODE state: READY device
ee: MISSION MODE state: READY
[   16.025027] mhi mhi0: State change event to state: M0
[   16.025041] mhi mhi0: local ee: MISSION MODE state: READY device
ee: MISSION MODE state: M0
[   16.038500] mhi mhi0: Received EE event: MISSION MODE
[   16.038505] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.051671] mhi mhi0: Handling state transition: MISSION MODE
[   16.057434] mhi mhi0: Processing Mission Mode transition
[   16.063780] mhi_net mhi0_IP_HW0: 100: Updating channel state to: START
[   16.197073] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.201196] mhi_net mhi0_IP_HW0: 100: Channel state change to START
successful
[   16.212565] mhi_net mhi0_IP_HW0: 101: Updating channel state to: START
[   16.362262] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.362268] mhi_net mhi0_IP_HW0: 101: Channel state change to START
successful
[   18.860081] mhi mhi0: Allowing M3 transition
[   18.864380] mhi mhi0: Waiting for M3 completion
[   19.080218] mhi mhi0: State change event to state: M3
[   19.080228] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M3
[   21.899049] mhi_wwan_ctrl mhi0_DUN: 32: Updating channel state to: START
[   21.924031] mhi mhi0: Entered with PM state: M3, MHI state: M3
[   21.952174] mhi mhi0: State change event to state: M0
[   21.952193] mhi mhi0: local ee: MISSION MODE state: M3 device ee:
MISSION MODE state: M0
[   21.967549] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   21.967553] mhi_wwan_ctrl mhi0_DUN: 32: Channel state change to
START successful

I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
come in this last iteration or in an earlier one. Do you want me to
try with v1 and v2 as well?

The patch that was working very reliably (100%) for me was the "bus:
mhi: Register IRQ handler after SYS_ERR check during power up" one,
which you attached here:
https://www.spinics.net/lists/linux-arm-msm/msg97646.html

-- 
Aleksander
https://aleksander.es



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux