On 09/06/22 at 03:05pm, Ard Biesheuvel wrote: > On Mon, 5 Sept 2022 at 14:08, Baoquan He <bhe@xxxxxxxxxx> wrote: > > > > On 09/05/22 at 01:28pm, Mike Rapoport wrote: > > > On Thu, Sep 01, 2022 at 08:25:54PM +0800, Baoquan He wrote: > > > > On 09/01/22 at 10:24am, Mike Rapoport wrote: > > > > > > > > max_zone_phys() only handles cases when CONFIG_ZONE_DMA/DMA32 enabled, > > > > the disabledCONFIG_ZONE_DMA/DMA32 case is not included. I can change > > > > it like: > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > { > > > > phys_addr_t low_mem_mask = U32_MAX; > > > > phys_addr_t phys_start = memblock_start_of_DRAM(); > > > > > > > > if ((!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) || > > > > (phys_start > U32_MAX)) > > > > low_mem_mask = PHYS_ADDR_MAX; > > > > > > > > return low_mem_mast + 1; > > > > } > > > > > > > > or add the disabled CONFIG_ZONE_DMA/DMA32 case into crash_addr_low_max() > > > > as you suggested. Which one do you like better? > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > { > > > > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > > > > return PHYS_ADDR_MAX + 1; > > > > > > > > return max_zone_phys(32); > > > > } > > > > > > I like the second variant better. > > > > Sure, will change to use the 2nd one . Thanks. > > > > While I appreciate the effort that has gone into solving this problem, > I don't think there is any consensus that an elaborate fix is required > to ensure that the crash kernel can be unmapped from the linear map at > all cost. In fact, I personally think we shouldn't bother, and IIRC, > Will made a remark along the same lines back when the Huawei engineers > were still driving this effort. > > So perhaps we could align on that before doing yet another version of this? Yes, certainly. That can save everybody's effort if there's different opinion. Thanks for looking into this and the suggestion. About Will's remark, I checked those discussing threads, guess you are mentioning the words in link [1]. I copy them at bottom for better reference. Pleasae correct me if I am wrong. With my understanding, Will said so because the patch is too complex, and there's risk that page table kernel data itself is using could share the same block/section mapping as crashkernel region. With these two cons, I agree with Will that we would rather take off the protection on crashkernel region which is done by mapping or unmapping the region, even though the protection enhances kdump's ronusness. Crashkernel reservation needs to know the low meory end so that DMA buffer can be addressed by the dumping target, e.g storage disk. On the current arm64, we have facts: 1)Currently, except of Raspberry Pi 4, all arm64 systems can support 32bit DMA addressing. So, except of RPi4, the low memory end can be decided after memblock init is done, namely at the end of arm64_memblock_init(). We don't need to defer the crashkernel reservation until zone_sizes_init() is done. Those cases can be checked in patch code. 2)For RPi4, if its storage disk is 30bit DMA addressing, then we can use crashkernel=xM@yM to specify reservation location under 1G to work around this. *** Based on above facts, with my patch applied: pros: 1) Performance issue is resolved; 2) As you can see, the code with this patch applied will much simpler, more straightforward and clearer; 3) The protection can be kept; 4) Crashkernel reservation can be easier to succeed on small memory system, e.g virt guest system. The earlier the reservation is done, it's more likely to get the whole chunk of meomry. cons: 1) Only RPi4 is put in inconvenience for crashkernel reservation. It needs to use crashkernel=xM@yM to work around. *** Take off the protection which is done by mapping or unmapping crashkernel region as you and Will suggested: pros: 1) Performance issue is resolved; 2) RPi4 will have the same convenience to set crashkernel; cons: 1) No protection is taken on crashkernel region; 2) Code logic is twisting. There are two places to separately reserve crashkernel, one is at the end of arm64_memblock_init(), one is at the end of bootmem_init(). 3) Except of both CONFIG_ZONE_DMA|DMA32 disabled case, crashkernel reservation is deferred. On small memory system, e.g virt guest system, it increases risk that the resrevation could fail very possibly caused by memory fragmentation. Besides, comparing the above two solutions, I also want to say kdump is developed for enterprise level of system. We need combine with reality when considering reasonable solution. E.g on x86_64, it has DMA zone of 16M and DMA32 zone from 16M to 4G always in normal kernel. For kdump, we ignore DMA zone directly because it's for ISA style devices. Kdump doesn't support ISA style device with only 24bit DMA addressing capability at the beginning, because it doesn't make sense, we never hear that an enterprise level of x86_64 system needs to arm with kdump. Hi Ard, Will, Catalin and other reviewers, Above is my understaning and thinking about the encountered issue, plesae help check and point out what's missing or incorrect. Hi Nicolas, If it's convenient to you, please help make clear if the storage disk or network card can only address 32bit DMA buffer on RPi4. Really appreciate that. *** [1]Will's remark on Huawei's patch https://lore.kernel.org/all/20220718131005.GA12406@willie-the-truck/T/#u ====quote Will's remark here I do not think that this complexity is justified. As I have stated on numerous occasions already, I would prefer that we leave the crashkernel mapped when rodata is not "full". That fixes your performance issue and matches what we do for module code, so I do not see a security argument against it. I do not plan to merge this patch as-is. ===