On Wed, 5 Jan 2022 at 17:50, Jon Hunter <jonathanh@xxxxxxxxxx> wrote: > > > On 05/01/2022 11:12, Ard Biesheuvel wrote: > > ... > > > Thanks for the report. > > > > It would be helpful if you could provide some more context: > > - does it happen on a LPAE build too? > > Enabling CONFIG_ARM_LPAE does work. > > > - does it only happen on SMP capable systems? > > - does it reproduce on such systems when using only a single CPU? > > (i.e., pass 'nosmp' on the kernel command line) > > Adding 'nosmp' does not help. > > > - when passing 'no_console_suspend' on the kernel command line, are > > any useful diagnostics produced? > > Adding 'no_console_suspend' does not produce any interesting logs. > > > - is there any way you could tell whether the crash/hang (assuming > > that is what you are observing) occurs on the suspend path or on > > resume? > > That is not clear. I see it entering suspend, but not clear if it is > failing on entering suspend or resuming. > Thanks a lot for providing this info. The fact that enabling LPAE makes the issue go away is a fairly strong hint that one of the CPUs comes up running in an address space that lacks the stack's vmapping in its copy of the swapper_pg_dir region - LPAE builds map swapper_pg_dir directly so there it can never go out of sync. Given that vmappings are global, and therefore cached in the TLB across context switches, it is not unlikely that the missing vmapping of the stack is in a task that runs before suspend, but does not cause any issues until after the CPU is reset completely (which takes cached TLB entries down with it) So in summary, this gives me something to chew on, and hopefully, I will be able to provide a proper fix shortly.