On Mon, Oct 28, 2019 at 01:02:19PM -0700, Bjorn Andersson wrote: > On Mon 28 Oct 12:11 PDT 2019, Mark Brown wrote: > > > On Mon, Oct 28, 2019 at 11:40:19AM -0700, Bjorn Andersson wrote: > > > On Mon 28 Oct 10:48 PDT 2019, Mark Brown wrote: > > > > On Mon, Oct 28, 2019 at 08:03:08AM -0700, kernelci.org bot wrote: > > > > > > Today's -next (anf Friday's) fails to boot on db820c: > > > > > > > defconfig: > > > > > gcc-8: > > > > > apq8096-db820c: 1 failed lab > > > > > > It looks like it deadlocks somewhere, the last things in the log are a > > > > failure to start ufshcd-qcom and then an RCU stall some time later: > > > > > db820c has been failing intermittently for a while now, it seems that > > > booting with kpti enabled causes something to go wrong. There are > > > nothing strange in the kernel logs and ftrace seems to indicate that all > > > the CPUs are idling nicely. > > > > Oh dear. Adding Catalin and Will. Is it definitely KPTI that's > > triggering stuff? It did turn up some bugs on other systems, though > > it's a bit strange it's only manifesting in KernelCI... > > I did a test recently where I booted my db820c 100 times with kpti=yes > and 100 times with kpti=no on the kernel command line, and the result > was 90% failure to reach console vs 0%. Going back and looking at the > logs for the 10% indicated that the boot CPU was fine, but I had stalls > reported on other CPUs. > > In an effort to rule out driver bugs I reduced the DT to CPUs, the core > clocks, gic, timers and serial driver, and I still saw the problem. > > I have not looked at this with jtag and hence do not know what secure > world is doing. Hmm. Is this a recent thing? Neither kpti nor the snapdragon 820 are particular new. Might be worth checking that CONFIG_QCOM_FALKOR_ERRATUM_1003 is enabled and getting patched in at runtime -- we had hardware issues during kpti development with this CPU. Will