On Thu, 2020-11-19 at 17:10 +0000, Catalin Marinas wrote: > On Thu, Nov 19, 2020 at 03:09:58PM +0100, Nicolas Saenz Julienne wrote: > > On Fri, 2020-11-13 at 11:29 +0000, Catalin Marinas wrote: > > [...] > > > > > > Let me stress that knowing the DMA constraints in the system before reserving > > > > > > crashkernel's regions is necessary if we ever want it to work seamlessly on all > > > > > > platforms. Be it small stuff like the Raspberry Pi or huge servers with TB of > > > > > > memory. > > > > > > > > > > Indeed. So we have 3 options (so far): > > > > > > > > > > 1. Allow the crashkernel reservation to go into the linear map but set > > > > > it to invalid once allocated. > > > > > > > > > > 2. Parse the flattened DT (not sure what we do with ACPI) before > > > > > creating the linear map. We may have to rely on some SoC ID here > > > > > instead of actual DMA ranges. > > > > > > > > > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel > > > > > reservations and not rely on arm64_dma_phys_limit in > > > > > reserve_crashkernel(). > > > > > > > > > > I think (2) we tried hard to avoid. Option (3) brings us back to the > > > > > issues we had on large crashkernel reservations regressing on some > > > > > platforms (though it's been a while since, they mostly went quiet ;)). > > > > > However, with Chen's crashkernel patches we end up with two > > > > > reservations, one in the low DMA zone and one higher, potentially above > > > > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel > > > > > reservations than what we have now. > > > > > > > > > > If (1) works, I'd go for it (James knows this part better than me), > > > > > otherwise we can go for (3). > > > > > > > > Overall, I'd prefer (1) as well, and I'd be happy to have a got at it. If not > > > > I'll append (3) in this series. > > > > > > I think for 1 we could also remove the additional KEXEC_CORE checks, > > > something like below, untested: > > > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > > index 3e5a6913acc8..27ab609c1c0c 100644 > > > --- a/arch/arm64/mm/mmu.c > > > +++ b/arch/arm64/mm/mmu.c > > > @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp) > > > int flags = 0; > > > u64 i; > > > > > > - if (rodata_full || debug_pagealloc_enabled()) > > > + if (rodata_full || debug_pagealloc_enabled() || > > > + IS_ENABLED(CONFIG_KEXEC_CORE)) > > > flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > > > > > /* > > > @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp) > > > * the following for-loop > > > */ > > > memblock_mark_nomap(kernel_start, kernel_end - kernel_start); > > > -#ifdef CONFIG_KEXEC_CORE > > > - if (crashk_res.end) > > > - memblock_mark_nomap(crashk_res.start, > > > - resource_size(&crashk_res)); > > > -#endif > > > > > > /* map all the memory banks */ > > > for_each_mem_range(i, &start, &end) { > > > @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp) > > > __map_memblock(pgdp, kernel_start, kernel_end, > > > PAGE_KERNEL, NO_CONT_MAPPINGS); > > > memblock_clear_nomap(kernel_start, kernel_end - kernel_start); > > > - > > > -#ifdef CONFIG_KEXEC_CORE > > > - /* > > > - * Use page-level mappings here so that we can shrink the region > > > - * in page granularity and put back unused memory to buddy system > > > - * through /sys/kernel/kexec_crash_size interface. > > > - */ > > > - if (crashk_res.end) { > > > - __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1, > > > - PAGE_KERNEL, > > > - NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS); > > > - memblock_clear_nomap(crashk_res.start, > > > - resource_size(&crashk_res)); > > > - } > > > -#endif > > > } > > > > > > void mark_rodata_ro(void) > > > > So as far as I'm concerned this is good enough for me. I took the time to > > properly test crashkernel on RPi4 using the series, this patch, and another > > small fix to properly update /proc/iomem. > > > > I'll send v7 soon, but before, James (or anyone for that matter) any obvious > > push-back to Catalin's solution? > > I talked to James earlier and he was suggesting that we check the > command line for any crashkernel reservations and only disable block > mappings in that case, see the diff below on top of the one I already > sent (still testing it). That's even better :) > If you don't have any other changes for v7, I'm happy to pick v6 up on > top of the no-block-mapping fix. Yes I've got a small change in patch #1, the crashkernel reservation has to be performed before request_standart_resouces() is called, which is OK, since we're all setup by then, I moved the crashkernel reservation at the end of bootmem_init(). I attached the patch. If it's easier for you I'll send v7. Regards, Nicolas
From 00dd2c31a027c42f80b76990a686000a36cc3bcf Mon Sep 17 00:00:00 2001 From: Nicolas Saenz Julienne <nsaenzjulienne@xxxxxxx> Date: Wed, 14 Oct 2020 14:02:44 +0200 Subject: [PATCH] arm64: mm: Move reserve_crashkernel() into mem_init() crashkernel might reserve memory located in ZONE_DMA. We plan to delay ZONE_DMA's initialization after unflattening the devicetree and ACPI's boot table initialization, so move it later in the boot process. Specifically into bootmem_init() since request_standard_resources() depends on it. Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@xxxxxxx> Tested-by: Jeremy Linton <jeremy.linton@xxxxxxx> --- Changes since v6: - More reserve placement earlier. --- arch/arm64/mm/init.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 71d463544400..fafdf992fd32 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -389,8 +389,6 @@ void __init arm64_memblock_init(void) else arm64_dma32_phys_limit = PHYS_MASK + 1; - reserve_crashkernel(); - reserve_elfcorehdr(); high_memory = __va(memblock_end_of_DRAM() - 1) + 1; @@ -430,6 +428,12 @@ void __init bootmem_init(void) sparse_init(); zone_sizes_init(min, max); + /* + * request_standard_resources() depends on crashkernel's memory being + * reserved, so do it here. + */ + reserve_crashkernel(); + memblock_dump_all(); } -- 2.29.2
Attachment:
signature.asc
Description: This is a digitally signed message part