Hi, On Fri, 22 Mar 2024 at 09:16, Baoquan He <bhe@xxxxxxxxxx> wrote: > > On 03/21/24 at 08:37pm, Li Huafei wrote: > > > > > > On 2024/3/21 18:06, Dave Young wrote: > > > Hi, > > > > > > On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@xxxxxxxxxx> wrote: > > >> > > >> Hi Baoquan, > > >> > > >> On 2024/3/21 17:17, chenhaixiang (A) wrote: > > >>> > > >>>>> I'm sorry for the delay. Here are some details from the boot log and > > >>>> /proc/iomem: > > >>>>> The Boot log: > > >>>>> [ 0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC) > > >>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20 > > >>>> 11:46:11 UTC 2024 > > >>>>> [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0 > > >>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap > > >>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1 > > >>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3 > > >>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug > > >>>> console=ttyS0,115200n8 console=tty0 > > >>>> ......snip... > > >>>>> [ 0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000 > > >>>> from=0x0000000000000000 max_addr=0x0000000100000000 > > >>>> reserve_crashkernel_generic+0x7c/0x220 > > >>>>> [ 0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000 > > >>>> from=0x0000000100000000 max_addr=0x0000400000000000 > > >>>> reserve_crashkernel_generic+0x7c/0x220 > > >>>>> [ 0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff] > > >>>> memblock_alloc_range_nid+0xee/0x170 > > >>>>> [ 0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000 > > >>>> from=0x0000000000000000 max_addr=0x0000000100000000 > > >>>> reserve_crashkernel_generic+0x11d/0x220 > > >>>>> [ 0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff] > > >>>> memblock_alloc_range_nid+0xee/0x170 > > >>>>> [ 0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000 > > >>>> (256 MB) > > >>>>> [ 0.022641] crashkernel reserved: 0x000000c01f000000 - > > >>>> 0x000000c03f000000 (512 MB) > > >>>> > > >>>> Here, crashkernel,low is reserved in region: [0x49000000 - 0x59000000] (256 > > >>>> MB) > > >>>> crashkernel,high is reserved in region: [0x000000c01f000000 - > > >>>> 0x000000c03f000000] (512 MB) ...... > > >>>>> [ 0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f] > > >>>> memblock_alloc_range_nid+0xee/0x170 > > >>>>> [ 0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> > > >>>> reserved > > >>>>> [ 0.029861] TSC deadline timer available > > >>>> > > >>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe > > >>>> "usable ==> reserved". This should be the step which prevents earlier reserved > > >>>> crashkernel,low from being added to iomem tree. I am not sure what triggered > > >>>> the e820 update. > > >> > > >> We added dump_stack () printing in efi_mem_reserve () and found that > > >> [0x53cbd000-0x53ccffff] was reserved by BGRT: > > >> > > >> [ 0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> > > >> reserved > > >> [ 0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted > > >> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7 > > >> [ 0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25 > > >> 08/30/2022 > > >> [ 0.032264] Call Trace: > > >> [ 0.032265] ? dump_stack+0x57/0x6e > > >> [ 0.032267] ? bgrt_init+0xc2/0xc2 > > >> [ 0.032268] ? __e820__range_update+0x7a/0x1d6 > > >> [ 0.032270] ? bgrt_init+0xc2/0xc2 > > >> [ 0.032272] ? bgrt_init+0xc2/0xc2 > > >> [ 0.032274] ? efi_arch_mem_reserve+0x1a3/0x1d0 > > >> [ 0.032276] ? efi_mem_reserve+0x2d/0x42 > > >> [ 0.032278] ? acpi_parse_bgrt+0xa/0x11 > > >> [ 0.032279] ? acpi_table_parse+0x86/0xbc > > >> [ 0.032281] ? acpi_boot_init+0x79/0xad > > >> [ 0.032282] ? setup_arch+0x835/0x954 > > >> [ 0.032284] ? start_kernel+0x5d/0x455 > > >> [ 0.032286] ? secondary_startup_64_no_verify+0xc2/0xcb > > >> > > >> efi_reserve_boot_services() has reserved memory of type > > >> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA before crashkernel. > > >> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by > > >> other modules. Then, the e820_table is directly updated, and the BGRT > > >> memory is reserved. > > >> > > >> However, memblock_is_region_reserved() in efi_reserve_boot_services() > > >> returns true when the ranges only overlap. > > >> > > >> already_reserved = memblock_is_region_reserved(start, size); > > > > > > Do you mean efi_reserve_boot_services is supposed to reserve the bgrt > > > memory but it does not reserve it due to the region overlapping with > > > some other reserved region? If so can you debug and find what exact > > > memblock reserved region overlaps with the bgrt? > > > > Yes. I added the following debug print to efi_reserve_boot_services(): > > > > --- a/arch/x86/platform/efi/quirks.c > > +++ b/arch/x86/platform/efi/quirks.c > > @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void) > > > > already_reserved = memblock_is_region_reserved(start, size); > > > > + pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, " > > + "size 0x%lx, type 0x%lx, already_reserved %d\n", > > + start, size, md->type, already_reserved); > > + > > /* > > * Because the following memblock_reserve() is paired > > * with memblock_free_late() for this region in > > > > It's great debugging and analysis, thanks you guys. Now there are > several questions: > > 1) why memory region [0x5976a018-0x5976abc7] is reserved by memblock > for efi_mem_attr_table. It's supposed to be outside of the > EFI_BOOT_SERVICES_DATA area? We may need check here if it's a bug. The mem_attr_table memory falls into a EFI Boot Service Data region [ 0.000000] efi: mem22: [Boot Data | | | | | | | | | | |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB) > > [ 0.000000] random: crng init done > [ 0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0 > > > This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA. > > [ 0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0 > > It falls in the following range > > [ 0.000000] efi: mem22: [Boot Data | | | | | | | | | | |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB) > > > > in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7] > > has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7] > > 2) Because efi_mem_attr_table memblock reserved [0x5976a018-0x5976abc7], > the whole EFI_BOOT_SERVICES_DATA area [0x5132900-0x5cefeff] is not > memblock reserved for later free. Excep of the small area, do we need > still memblock reserve the remaining area, we may need check if this is > a bug. I think the whole EFI Boot Data region should be reserved temperately by efi_reserve_boot_services, but if they should be reserved partially as multiple smaller regions I'm not sure, I added Ard and EFI list in another reply, let's see how EFI people think. > > > > > [ 0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1 > > > > It is not reserved by memblock, this free memory region is allocated by crashkernel > > > > [ 0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB) > > [ 0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB) > > > > In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully > > reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with > > the crashkernel region. > > (3) efi_bgrt_init() should be innocent because it's supposed to safely > use the area according to the existing efi quirk handling. Agreed > > > (4) the deferring of adding crashh_low_res to iomem exposed the above > efi issue. When we cancel the deferring of crashh_res inserting into > iomem, we can see that the brgt area is reserved inside crashkernel > region, that's problematic. > > 2d4fd058-60efefff : System RAM > 2d4fd058-58ffffff : System RAM > 49000000-58ffffff : Crash kernel > 53cbd000-53ccffff : Reserved <--- > 60eff000-704fefff : Reserved > -- > 93dd424000-93dd9fffff : Kernel bss > c01f000000-c03effffff : Crash kernel > d0000000000-d0fffffffff : PCI Bus 0000:00 > d0000000000-d00001fffff : PCI Bus 0000:01 > > > > > [ 0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved > > > > > > BTW, the previous email threads are weird, and not threading > > > correctly, hard to find information. > > > > It should be because the log content is too large and has been put on hold. In my previous email, I received a prompt: > > > > The reason it is being held: > > > > Message body is too big: 248998 bytes with a limit of 40 KB > > > > > > > > > >> > > >> /* > > >> * Because the following memblock_reserve() is paired > > >> * with memblock_free_late() for this region in > > >> * efi_free_boot_services(), we must be extremely > > >> * careful not to reserve, and subsequently free, > > >> * critical regions of memory (like the kernel image) or > > >> * those regions that somebody else has already > > >> * reserved. > > >> * > > >> * A good example of a critical region that must not be > > >> * freed is page zero (first 4Kb of memory), which may > > >> * contain boot services code/data but is marked > > >> * E820_TYPE_RESERVED by trim_bios_range(). > > >> */ > > >> if (!already_reserved) { > > >> memblock_reserve(start, size); > > >> > > >> /* > > >> * If we are the first to reserve the region, no > > >> * one else cares about it. We own it and can > > >> * free it later. > > >> */ > > >> if (can_free_region(start, size)) > > >> continue; > > >> } > > >> > > >> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in > > >> advance. The subsequent crashkernel happens to reserve this portion of > > >> memory, which conflicts with BGRT. > > >> > > >>> Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table. > > >>> > > >>>> > > >>>> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd > > >>>> kernel, or reboot from bios/firmware boot up into 6.8.0? > > >>> It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch, > > >>> and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem > > >>> of the 6.8 kernel. > > >>> > > >>> 2d4fd058-60efefff : System RAM > > >>> 2d4fd058-58ffffff : System RAM > > >>> 49000000-58ffffff : Crash kernel > > >>> 53cbd000-53ccffff : Reserved > > >>> 60eff000-704fefff : Reserved > > >>> -- > > >>> 93dd424000-93dd9fffff : Kernel bss > > >>> c01f000000-c03effffff : Crash kernel > > >>> d0000000000-d0fffffffff : PCI Bus 0000:00 > > >>> d0000000000-d00001fffff : PCI Bus 0000:01 > > >>>> > > >>>> Reverting below commit should fix your problem, can you try it? > > >>>> > > >>>> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21 > > >>>> Author: Huacai Chen <chenhuacai@xxxxxxxxxx> > > >>>> Date: Fri Dec 29 16:02:13 2023 +0800 > > >>>> > > >>>> kdump: defer the insertion of crashkernel resources > > >>> > > >>> . > > >>> > > >> > > >> _______________________________________________ > > >> kexec mailing list > > >> kexec@xxxxxxxxxxxxxxxxxxx > > >> http://lists.infradead.org/mailman/listinfo/kexec > > > > > > . > > > > > > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec