On Tue, Nov 5, 2019 at 9:06 AM John Stultz <john.stultz@xxxxxxxxxx> wrote: > On Tue, Nov 5, 2019 at 2:29 AM Will Deacon <will@xxxxxxxxxx> wrote: > > > > Hi John, > > > > On Mon, Nov 04, 2019 at 05:16:42PM -0800, John Stultz wrote: > > > On Tue, Oct 29, 2019 at 8:31 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > > > > > > > Shared and writable mappings (__S.1.) should be clean (!dirty) initially > > > > and made dirty on a subsequent write either through the hardware DBM > > > > (dirty bit management) mechanism or through a write page fault. A clean > > > > pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY > > > > clear. > > > > > > > > The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and > > > > PTE_DIRTY clear. Prior to commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY > > > > bit handling out of set_pte_at()"), it was the responsibility of > > > > set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the > > > > software PTE_DIRTY bit was not set. However, the above commit removed > > > > the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in > > > > set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions > > > > unchanged. The result is that shared+writable mappings are now dirty by > > > > default > > > > > > > > Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}. > > > > In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_* > > > > attributes. > > > > > > > > Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") > > > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.14.x- > > > > Cc: Will Deacon <will@xxxxxxxxxx> > > > > Signed-off-by: Catalin Marinas <catalin.marinas@xxxxxxx> > > > > > > Hey, > > > So I'm not yet sure why, but I've just validated that this patch is > > > causing trouble with booting AOSP on HiKey960 with 5.4-rc6 (-rc5 works > > > fine). > > > > Hmm. Annoying this wasn't spotted by CI. > > > > > Its odd, because the system does boot and is alive, but seems to stall > > > out at the boot animation, and userland never finishes coming up to > > > the home screen. It just sits there without a useful error message > > > that I can find so far. Reverting just this patch seems to solve it > > > and it boots all the way. > > > > Given that I don't think the HiKey960 supports h/w DBM, my initial guess > > is that the GPU is stuck on a page fault. > > > > > I'll try to dig further to see what might be going on (the mali driver > > > is a prime suspect here), but I wanted to raise the flag since we're > > > at the end of the -rc cycle. > > > > What exactly are you using for the mali driver? > > I've got an old r10p0 bifrost blob we were given and kernel patches > I've carried forward since then. > > Again, I don't want to distract you too much for something that may be > related to a blob driver. I mostly just wanted to raise a flag in case > there was something off that might affect others. Just as a further detail (about to close up for the day), I'm also seeing this issue on the HiKey board as well. Similarly reverting 747a70e60b72 resolves it. Its a mali blob driver too, but a different one (utgard) which makes me suspect this might be a real issue w/ something in AOSP. I'll be testing on a db845c tomorrow morning to see if I can trigger it there as well. thanks -john