Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks

Ard Biesheuvel <ardb@xxxxxxxxxx> · Wed, 5 Jan 2022 18:02:23 +0100

On Wed, 5 Jan 2022 at 17:50, Jon Hunter <jonathanh@xxxxxxxxxx> wrote:
>
>
> On 05/01/2022 11:12, Ard Biesheuvel wrote:
>
> ...
>
> > Thanks for the report.
> >
> > It would be helpful if you could provide some more context:
> > - does it happen on a LPAE build too?
>
> Enabling CONFIG_ARM_LPAE does work.
>
> > - does it only happen on SMP capable systems?
> > - does it reproduce on such systems when using only a single CPU?
> > (i.e., pass 'nosmp' on the kernel command line)
>
> Adding 'nosmp' does not help.
>
> > - when passing 'no_console_suspend' on the kernel command line, are
> > any useful diagnostics produced?
>
> Adding 'no_console_suspend' does not produce any interesting logs.
>
> > - is there any way you could tell whether the crash/hang (assuming
> > that is what you are observing) occurs on the suspend path or on
> > resume?
>
> That is not clear. I see it entering suspend, but not clear if it is
> failing on entering suspend or resuming.
>

Thanks a lot for providing this info.

The fact that enabling LPAE makes the issue go away is a fairly strong
hint that one of the CPUs comes up running in an address space that
lacks the stack's vmapping in its copy of the swapper_pg_dir region -
LPAE builds map swapper_pg_dir directly so there it can never go out
of sync.

Given that vmappings are global, and therefore cached in the TLB
across context switches, it is not unlikely that the missing vmapping
of the stack is in a task that runs before suspend, but does not cause
any issues until after the CPU is reset completely (which takes cached
TLB entries down with it)

So in summary, this gives me something to chew on, and hopefully, I
will be able to provide a proper fix shortly.