Hi Ard, On 21.12.2021 11:44, Ard Biesheuvel wrote: > On Tue, 21 Dec 2021 at 11:39, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >> On 22.11.2021 10:28, Ard Biesheuvel wrote: >>> Wire up the generic support for managing task stack allocations via vmalloc, >>> and implement the entry code that detects whether we faulted because of a >>> stack overrun (or future stack overrun caused by pushing the pt_regs array) >>> >>> While this adds a fair amount of tricky entry asm code, it should be >>> noted that it only adds a TST + branch to the svc_entry path. The code >>> implementing the non-trivial handling of the overflow stack is emitted >>> out-of-line into the .text section. >>> >>> Since on ARM, we rely on do_translation_fault() to keep PMD level page >>> table entries that cover the vmalloc region up to date, we need to >>> ensure that we don't hit such a stale PMD entry when accessing the >>> stack. So we do a dummy read from the new stack while still running from >>> the old one on the context switch path, and bump the vmalloc_seq counter >>> when PMD level entries in the vmalloc range are modified, so that the MM >>> switch fetches the latest version of the entries. >>> >>> Note that we need to increase the per-mode stack by 1 word, to gain some >>> space to stash a GPR until we know it is safe to touch the stack. >>> However, due to the cacheline alignment of the struct, this does not >>> actually increase the memory footprint of the struct stack array at all. >>> >>> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> >>> Tested-by: Keith Packard <keithpac@xxxxxxxxxx> >> This patch landed recently in linux-next 20211220 as commit a1c510d0adc6 >> ("ARM: implement support for vmap'ed stacks"). Sadly it breaks >> suspend/resume operation on all ARM 32bit Exynos SoCs. Probably the >> suspend/resume related code must be updated somehow (it partially works >> on physical addresses and disabled MMU), but I didn't analyze it yet. If >> you have any hints, let me know. >> > Are there any such systems in KernelCI? We caught a suspend/resume > related issue in development, which is why the hunk below was added. I think that some Exynos-based Odroids (U3 and XU3) were some time ago available in KernelCI, but I don't know if they are still there. > In general, any virt-to-phys translation involving and address on the > stack will become problematic. > > Could you please confirm whether the issue persists with the patch > applied but with CONFIG_VMAP_STACK turned off? Just so we know we are > looking in the right place? I've just checked. After disabling CONFIG_VMAP_STACK suspend/resume works fine both on commit a1c510d0adc6 and linux-next 20211220. >> diff --git a/arch/arm/kernel/sleep.S b/arch/arm/kernel/sleep.S >> index 43077e11dafd..803b51e5cba0 100644 >> --- a/arch/arm/kernel/sleep.S >> +++ b/arch/arm/kernel/sleep.S >> @@ -67,6 +67,14 @@ ENTRY(__cpu_suspend) >> ldr r4, =cpu_suspend_size >> #endif >> mov r5, sp @ current virtual SP >> +#ifdef CONFIG_VMAP_STACK >> + @ Run the suspend code from the overflow stack so we don't have to rely >> + @ on vmalloc-to-phys conversions anywhere in the arch suspend code. >> + @ The original SP value captured in R5 will be restored on the way out. >> + mov_l r6, overflow_stack_ptr @ Base pointer >> + mrc p15, 0, r7, c13, c0, 4 @ Get per-CPU offset >> + ldr sp, [r6, r7] @ Address of this CPU's overflow stack >> +#endif >> add r4, r4, #12 @ Space for pgd, virt sp, phys resume fn >> sub sp, sp, r4 @ allocate CPU state on stack >> ldr r3, =sleep_save_sp Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland