Hi Chen Zhou, On 13/06/2019 12:27, Chen Zhou wrote: > On 2019/6/6 0:29, James Morse wrote: >> On 07/05/2019 04:50, Chen Zhou wrote: >>> When crashkernel is reserved above 4G in memory, kernel should >>> reserve some amount of low memory for swiotlb and some DMA buffers. >> >>> Meanwhile, support crashkernel=X,[high,low] in arm64. When use >>> crashkernel=X parameter, try low memory first and fall back to high >>> memory unless "crashkernel=X,high" is specified. >> >> What is the 'unless crashkernel=...,high' for? I think it would be simpler to relax the >> ARCH_LOW_ADDRESS_LIMIT if reserve_crashkernel_low() allocated something. >> >> This way "crashkernel=1G" tries to allocate 1G below 4G, but fails if there isn't enough >> memory. "crashkernel=1G crashkernel=16M,low" allocates 16M below 4G, which is more likely >> to succeed, if it does it can then place the 1G block anywhere. >> > Yeah, this is much simpler. >>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c >>> index 413d566..82cd9a0 100644 >>> --- a/arch/arm64/kernel/setup.c >>> +++ b/arch/arm64/kernel/setup.c >>> @@ -243,6 +243,9 @@ static void __init request_standard_resources(void) >>> request_resource(res, &kernel_data); >>> #ifdef CONFIG_KEXEC_CORE >>> /* Userspace will find "Crash kernel" region in /proc/iomem. */ >>> + if (crashk_low_res.end && crashk_low_res.start >= res->start && >>> + crashk_low_res.end <= res->end) >>> + request_resource(res, &crashk_low_res); >>> if (crashk_res.end && crashk_res.start >= res->start && >>> crashk_res.end <= res->end) >>> request_resource(res, &crashk_res); >> >> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called >> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it >> find "Crash kernel", you are always going to get the kernel placed in the lower portion. >> >> I suspect this isn't what you want, can we rename crashk_low_res for arm64 so that >> existing kexec-tools doesn't use it? > In my patchset, in addition to the kernel patches, i also modify the kexec-tools. > arm64: support more than one crash kernel regions(http://lists.infradead.org/pipermail/kexec/2019-April/022792.html). > In kexec-tools patch, we read all the "Crash kernel" entry and load crash kernel high. But we can't rely on people updating user-space when they update the kernel! [...] >> I'm afraid you've missed the ugly bit of the crashkernel reservation... >> >> arch/arm64/mm/mmu.c::map_mem() marks the crashkernel as 'nomap' during the first pass of >> page-table generation. This means it isn't mapped in the linear map. It then maps it with >> page-size mappings, and removes the nomap flag. >> >> This is done so that arch_kexec_protect_crashkres() and >> arch_kexec_unprotect_crashkres() can remove the valid bits of the crashkernel mapping. >> This way the old-kernel can't accidentally overwrite the crashkernel. It also saves us if >> the old-kernel and the crashkernel use different memory attributes for the mapping. >> >> As your low-memory reservation is intended to be used for devices, having it mapped by the >> old-kernel as cacheable memory is going to cause problems if those CPUs aren't taken >> offline and go corrupting this memory. (we did crash for a reason after all) >> >> >> I think the simplest thing to do is mark the low region as 'nomap' in >> reserve_crashkernel() and always leave it unmapped. We can then describe it via a >> different string in /proc/iomem, something like "Crash kernel (low)". Older kexec-tools >> shouldn't use it, (I assume its not using strncmp() in a way that would do this by >> accident), and newer kexec-tools can know to describe it in the DT, but it can't write to it. > I did miss the bit of the crashkernel reservation. > I will fix this in next version. I think all that is needed is to make the low-region nmap, and describe it via /proc/iomem with a name where nothing will try and use it by accident. Thanks, James