On Fri, Oct 22, 2021 at 11:20:00AM +0200, Aleksander Morgado wrote: > Hey, > > > > The successful boots seem to happen always on cold boots, and the > > > success rate is low (30% or so) on some manual testing here. I haven't > > > seen one single successful boot on system restarts, they all fail like > > > in the previous email. > > > > > > When the boot is successful it looks like this: > > > > > > > This looks to be a firmware issue. The device is in SYS_ERR state during > > boot and that's expected. But what is strange is that the device stays > > in SYS_ERR even after host issues RESET. > > > > Can you try the below diff and see if it does any good? > > > > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c > > index fb99e3727155..a43c3ed77fb1 100644 > > --- a/drivers/bus/mhi/core/pm.c > > +++ b/drivers/bus/mhi/core/pm.c > > @@ -104,7 +104,8 @@ static struct mhi_pm_transitions const dev_state_transitions[] = { > > /* L3 States */ > > { > > MHI_PM_LD_ERR_FATAL_DETECT, > > - MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_DISABLE > > + MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_DISABLE | > > + MHI_PM_SYS_ERR_PROCESS > > }, > > }; > > Tested again in the RPi CM4 based setup, but didn't help, it's failing > in the same way, still says PASS THROUGH state: SYS ERROR: > Yes, that's expected. As I said, the device is going to a bad state and from the host side, we could only try to recover it. > [ 7.032037] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x > [ 7.039213] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem > 0x600000000-0x600000fff 64bit] > [ 7.047759] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002) > [ 7.054573] mhi-pci-generic 0000:01:00.0: using shared MSI > [ 7.060848] mhi mhi0: Requested to power ON > [ 7.065277] mhi mhi0: Attempting power on with EE: PASS THROUGH, > state: SYS ERROR > [ 7.072799] mhi mhi0: local ee: INVALID_EE state: RESET device ee: > PASS THROUGH state: SYS ERROR > [ 7.081589] mhi mhi0: System error detected > [ 7.085867] mhi-pci-generic 0000:01:00.0: firmware crashed (7) > [ 7.091886] mhi mhi0: Handling state transition: SYS ERROR > [ 7.097399] mhi mhi0: Transitioning from PM state: SYS ERROR Detect > to: SYS ERROR Process > [ 7.105588] mhi-pci-generic 0000:01:00.0: firmware crashed (6) > What happened after this point? Can you share the complete log? > I've tested the same patches in my desktop PC (based on 5.13.1, and > even without this last addition) and the boot process is much more > stable and I cannot see the "firmware crashed" errors reported. My > assumption right now is that the pci_generic.c entries we're adding > are correct, but there's some limitation in this system that is making > the EM9191 boot fail, but I still don't know which limitation it is. > The memory addresses in the "BAR 0: assigned" log are definitely > different in the RPi CM4, and also the shared MSI limitation. I recall > Thomas saying that he also tested on a desktop PC forcing the shared > MSI limitation and he had the same kind of firmware errors reported; > I'll also try to test that. > I think the PCI behaviour could be the issue between these 2 setups. But for knowing exactly what's happening we need to get the log of the modem (I don't think you can get that though). Thanks, Mani