* Andrew F. Davis <afd@xxxxxx> [170215 14:14]: > On 02/15/2017 01:12 PM, Tony Lindgren wrote: > > * Tony Lindgren <tony@xxxxxxxxxxx> [170215 10:40]: > >> * Tony Lindgren <tony@xxxxxxxxxxx> [170214 11:39]: > >>> * Tony Lindgren <tony@xxxxxxxxxxx> [170213 13:51]: > >>>> Commit 3251885285e1 ("ARM: OMAP4+: Reset CPU1 properly for kexec") started > >>>> resetting cpu1 because of a kexec boot issue I was seeing earlier in 2016 > >>>> on omap4 when doing kexec boot between two different kernel versions. The > >>>> booted kernel ended up trying to use the old kernel start-up address unless > >>>> cpu1 was reset before configuring the cpu1 start-up address. > >>>> > >>>> It seems the reset part was not correct but probably working around some > >>>> other issue. I have not been able to reproduce this issue any longer despite > >>>> testing with backported patches back to v4.6 kernel. So it is possible this > >>>> issue was caused by other work in progress kexec patches I had applied. Or > >>>> it is possible some other fixes have made the issue go way. > >>>> > >>>> The unconditional reset of cpu1 can cause issues booting some devices. For > >>>> example, bootloader configured secure OS running on cpu1 will fail as the > >>>> configuration is not preserved as reported by Andrew F. Davis <afd@xxxxxx>. > >>>> > >>>> Let's fix the issue by reverting the cpu1 reset parts. If it turns out we > >>>> still need to reset cpu1 in some cases, we can add it back and do it > >>>> conditionally. > >>> > >>> Actually with this I'm now seeing cpu1 not come up after a suspend/resume > >>> cycle on duovero: > >>> > >>> [ 118.257415] CPU1: shutdown > >>> [ 118.294616] Error taking CPU1 up: -2 > >>> [ 118.299072] PM: noirq resume of devices complete after 3.723 msecs > >>> [ 118.303802] PM: early resume of devices complete after 3.723 msecs > >>> > >>> So this issue needs to be investigated more. > >> > >> And then today the omap4 suspend/resume issue is no longer reproducable.. > >> Go figure. > >> > >> But then doing more testing I noticed that also omap5 needs the reset. > >> Without it we get the following on omap5-uevm doing a kexec boot. So clearly > >> the reset cannot be just removed at least for omap4 and omap5. > > > > And also the same issue happens doing kexec on beagle-x15 naturally if > > the cpu1 reset is removed. > > > > When a core actually powers up it idles in ROM code waiting for > OMAP_AUX_CORE_BOOT_0 to be set. When we shutdown a core it is not really > powered off, we just let it spin in omap4_cpu_die() or > omap4_secondary_startup() waiting on OMAP_AUX_CORE_BOOT_0, just like if > it were still trapped in ROM after a reset. > > The issue with this fake startup idle loop is that, unlike the ROM based > startup idle loop, these do *not* jump to the address we stored in > OMAP_AUX_CORE_BOOT_1, they just make the assumption that they can safely > jump to the kernel startup function. > > So when we tell this core to boot, and it is not in the real ROM startup > loop, it breaks stuff as it jumps to the old kernel's > secondary_startup() even though we gave it the correct address in > OMAP_AUX_CORE_BOOT_1. Yes this is probably what's going on here. Note that the error I pasted was booting the same kernel where that address should be correct though. So there might be something else to it also. > Reseting the core to put it back in the real ROM idle loop is wrong, the > two idle loop functions above should be fixed to respect the address in > OMAP_AUX_CORE_BOOT_1 and not to make assumptions, this should take care > of the kexec failure in a sane way. OK care to try to patch it as now you also have a reproducable test case for kexec too? Regards, Tony -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html