Hey, > > The successful boots seem to happen always on cold boots, and the > > success rate is low (30% or so) on some manual testing here. I haven't > > seen one single successful boot on system restarts, they all fail like > > in the previous email. > > > > When the boot is successful it looks like this: > > > > This looks to be a firmware issue. The device is in SYS_ERR state during > boot and that's expected. But what is strange is that the device stays > in SYS_ERR even after host issues RESET. > > Can you try the below diff and see if it does any good? > > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c > index fb99e3727155..a43c3ed77fb1 100644 > --- a/drivers/bus/mhi/core/pm.c > +++ b/drivers/bus/mhi/core/pm.c > @@ -104,7 +104,8 @@ static struct mhi_pm_transitions const dev_state_transitions[] = { > /* L3 States */ > { > MHI_PM_LD_ERR_FATAL_DETECT, > - MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_DISABLE > + MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_DISABLE | > + MHI_PM_SYS_ERR_PROCESS > }, > }; Tested again in the RPi CM4 based setup, but didn't help, it's failing in the same way, still says PASS THROUGH state: SYS ERROR: [ 7.032037] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x [ 7.039213] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit] [ 7.047759] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002) [ 7.054573] mhi-pci-generic 0000:01:00.0: using shared MSI [ 7.060848] mhi mhi0: Requested to power ON [ 7.065277] mhi mhi0: Attempting power on with EE: PASS THROUGH, state: SYS ERROR [ 7.072799] mhi mhi0: local ee: INVALID_EE state: RESET device ee: PASS THROUGH state: SYS ERROR [ 7.081589] mhi mhi0: System error detected [ 7.085867] mhi-pci-generic 0000:01:00.0: firmware crashed (7) [ 7.091886] mhi mhi0: Handling state transition: SYS ERROR [ 7.097399] mhi mhi0: Transitioning from PM state: SYS ERROR Detect to: SYS ERROR Process [ 7.105588] mhi-pci-generic 0000:01:00.0: firmware crashed (6) I've tested the same patches in my desktop PC (based on 5.13.1, and even without this last addition) and the boot process is much more stable and I cannot see the "firmware crashed" errors reported. My assumption right now is that the pci_generic.c entries we're adding are correct, but there's some limitation in this system that is making the EM9191 boot fail, but I still don't know which limitation it is. The memory addresses in the "BAR 0: assigned" log are definitely different in the RPi CM4, and also the shared MSI limitation. I recall Thomas saying that he also tested on a desktop PC forcing the shared MSI limitation and he had the same kind of firmware errors reported; I'll also try to test that. Here are the logs in my desktop pc for reference: oct 21 09:24:06 ares kernel: mhi-pci-generic 0000:17:00.0: MHI PCI device found: sierra-em919x oct 21 09:24:06 ares kernel: mhi-pci-generic 0000:17:00.0: BAR 0: assigned [mem 0xb5e01000-0xb5e01fff 64bit] oct 21 09:24:06 ares kernel: mhi mhi0: Requested to power ON oct 21 09:24:06 ares kernel: mhi mhi0: Power on setup success oct 21 09:24:06 ares kernel: mhi mhi0: Handling state transition: READY oct 21 09:24:06 ares kernel: mhi mhi0: Device in READY State oct 21 09:24:06 ares kernel: mhi mhi0: Initializing MHI registers oct 21 09:24:06 ares kernel: mhi mhi0: State change event to state: M0 oct 21 09:24:06 ares kernel: mhi mhi0: Received EE event: MISSION MODE oct 21 09:24:06 ares kernel: mhi mhi0: Handling state transition: MISSION MODE oct 21 09:24:06 ares kernel: mhi mhi0: Processing Mission Mode transition oct 21 09:24:06 ares kernel: mhi_net mhi0_IP_HW0: 100: Updating channel state to: START oct 21 09:24:06 ares kernel: mhi_net mhi0_IP_HW0: 100: Channel state change to START successful oct 21 09:24:06 ares kernel: mhi_net mhi0_IP_HW0: 101: Updating channel state to: START oct 21 09:24:06 ares kernel: mhi_net mhi0_IP_HW0: 101: Channel state change to START successful oct 21 09:24:08 ares kernel: mhi mhi0: Allowing M3 transition oct 21 09:24:08 ares kernel: mhi mhi0: Waiting for M3 completion oct 21 09:24:08 ares kernel: mhi mhi0: State change event to state: M3 -- Aleksander https://aleksander.es