On 09/08/22 at 09:33pm, Baoquan He wrote: > On 09/06/22 at 03:05pm, Ard Biesheuvel wrote: > > On Mon, 5 Sept 2022 at 14:08, Baoquan He <bhe@xxxxxxxxxx> wrote: > > > > > > On 09/05/22 at 01:28pm, Mike Rapoport wrote: > > > > On Thu, Sep 01, 2022 at 08:25:54PM +0800, Baoquan He wrote: > > > > > On 09/01/22 at 10:24am, Mike Rapoport wrote: > > > > > > > > > > max_zone_phys() only handles cases when CONFIG_ZONE_DMA/DMA32 enabled, > > > > > the disabledCONFIG_ZONE_DMA/DMA32 case is not included. I can change > > > > > it like: > > > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > > { > > > > > phys_addr_t low_mem_mask = U32_MAX; > > > > > phys_addr_t phys_start = memblock_start_of_DRAM(); > > > > > > > > > > if ((!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) || > > > > > (phys_start > U32_MAX)) > > > > > low_mem_mask = PHYS_ADDR_MAX; > > > > > > > > > > return low_mem_mast + 1; > > > > > } > > > > > > > > > > or add the disabled CONFIG_ZONE_DMA/DMA32 case into crash_addr_low_max() > > > > > as you suggested. Which one do you like better? > > > > > > > > > > static phys_addr_t __init crash_addr_low_max(void) > > > > > { > > > > > if (!IS_ENABLED(CONFIG_ZONE_DMA) && !IS_ENABLED(CONFIG_ZONE_DMA32)) > > > > > return PHYS_ADDR_MAX + 1; > > > > > > > > > > return max_zone_phys(32); > > > > > } > > > > > > > > I like the second variant better. > > > > > > Sure, will change to use the 2nd one . Thanks. > > > > > > > While I appreciate the effort that has gone into solving this problem, > > I don't think there is any consensus that an elaborate fix is required > > to ensure that the crash kernel can be unmapped from the linear map at > > all cost. In fact, I personally think we shouldn't bother, and IIRC, > > Will made a remark along the same lines back when the Huawei engineers > > were still driving this effort. > > > > So perhaps we could align on that before doing yet another version of this? > > Yes, certainly. That can save everybody's effort if there's different > opinion. Thanks for looking into this and the suggestion. > > About Will's remark, I checked those discussing threads, guess you are > mentioning the words in link [1]. I copy them at bottom for better > reference. Pleasae correct me if I am wrong. > > With my understanding, Will said so because the patch is too complex, > and there's risk that page table kernel data itself is using could share > the same block/section mapping as crashkernel region. With these > two cons, I agree with Will that we would rather take off the protection > on crashkernel region which is done by mapping or unmapping the region, > even though the protection enhances kdump's ronusness. > > Crashkernel reservation needs to know the low meory end so that DMA > buffer can be addressed by the dumping target, e.g storage disk. On the > current arm64, we have facts: > 1)Currently, except of Raspberry Pi 4, all arm64 systems can support > 32bit DMA addressing. So, except of RPi4, the low memory end can be > decided after memblock init is done, namely at the end of > arm64_memblock_init(). We don't need to defer the crashkernel > reservation until zone_sizes_init() is done. Those cases can be checked > in patch code. > 2)For RPi4, if its storage disk is 30bit DMA addressing, then we can > use crashkernel=xM@yM to specify reservation location under 1G to > work around this. > > *** > Based on above facts, with my patch applied: > pros: > 1) Performance issue is resolved; > 2) As you can see, the code with this patch applied will much > simpler, more straightforward and clearer; > 3) The protection can be kept; > 4) Crashkernel reservation can be easier to succeed on small memory > system, e.g virt guest system. The earlier the reservation is done, > it's more likely to get the whole chunk of meomry. > cons: > 1) Only RPi4 is put in inconvenience for crashkernel reservation. It > needs to use crashkernel=xM@yM to work around. > > *** > Take off the protection which is done by mapping or unmapping > crashkernel region as you and Will suggested: > pros: > 1) Performance issue is resolved; > 2) RPi4 will have the same convenience to set crashkernel; > > cons: > 1) No protection is taken on crashkernel region; > 2) Code logic is twisting. There are two places to separately reserve > crashkernel, one is at the end of arm64_memblock_init(), one is at > the end of bootmem_init(). > 3) Except of both CONFIG_ZONE_DMA|DMA32 disabled case, crashkernel > reservation is deferred. On small memory system, e.g virt guest system, > it increases risk that the resrevation could fail very possibly caused > by memory fragmentation. > > Besides, comparing the above two solutions, I also want to say kdump > is developed for enterprise level of system. We need combine with > reality when considering reasonable solution. E.g on x86_64, it has DMA > zone of 16M and DMA32 zone from 16M to 4G always in normal kernel. For > kdump, we ignore DMA zone directly because it's for ISA style devices. > Kdump doesn't support ISA style device with only 24bit DMA addressing > capability at the beginning, because it doesn't make sense, we never > hear that an enterprise level of x86_64 system needs to arm with kdump. Sorry, here I mean we never hear that an enterprise level of x86_64 system owns ISA storage disk and needs to arm with kdump. > > Hi Ard, Will, Catalin and other reviewers, > > Above is my understaning and thinking about the encountered issue, > plesae help check and point out what's missing or incorrect. > > Hi Nicolas, > > If it's convenient to you, please help make clear if the storage disk or > network card can only address 32bit DMA buffer on RPi4. Really ~~30bit, typo > appreciate that. > > *** > [1]Will's remark on Huawei's patch > https://lore.kernel.org/all/20220718131005.GA12406@willie-the-truck/T/#u > > ====quote Will's remark here > I do not think that this complexity is justified. As I have stated on > numerous occasions already, I would prefer that we leave the crashkernel > mapped when rodata is not "full". That fixes your performance issue and > matches what we do for module code, so I do not see a security argument > against it. > > I do not plan to merge this patch as-is. > === > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec