在 2019年10月15日 19:04, Eric W. Biederman 写道: > lijiang <lijiang@xxxxxxxxxx> writes: > >> 在 2019年10月13日 11:54, Eric W. Biederman 写道: >>> Dave Young <dyoung@xxxxxxxxxx> writes: >>> >>>> Hi Eric, >>>> >>>> On 10/12/19 at 06:26am, Eric W. Biederman wrote: >>>>> Lianbo Jiang <lijiang@xxxxxxxxxx> writes: >>>>> >>>>>> When the crashkernel kernel command line option is specified, the >>>>>> low 1MiB memory will always be reserved, which makes that the memory >>>>>> allocated later won't fall into the low 1MiB area, thereby, it's not >>>>>> necessary to create a backup region and also no need to copy the first >>>>>> 640k content to a backup region. >>>>>> >>>>>> Currently, the code related to the backup region can be safely removed, >>>>>> so lets clean up. >>>>>> >>>>>> Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx> >>>>>> --- >>>>> >>>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c >>>>>> index eb651fbde92a..cc5774fc84c0 100644 >>>>>> --- a/arch/x86/kernel/crash.c >>>>>> +++ b/arch/x86/kernel/crash.c >>>>>> @@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >>>>>> >>>>>> #ifdef CONFIG_KEXEC_FILE >>>>>> >>>>>> -static unsigned long crash_zero_bytes; >>>>>> - >>>>>> static int get_nr_ram_ranges_callback(struct resource *res, void *arg) >>>>>> { >>>>>> unsigned int *nr_ranges = arg; >>>>>> @@ -234,9 +232,15 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg) >>>>>> { >>>>>> struct crash_mem *cmem = arg; >>>>>> >>>>>> - cmem->ranges[cmem->nr_ranges].start = res->start; >>>>>> - cmem->ranges[cmem->nr_ranges].end = res->end; >>>>>> - cmem->nr_ranges++; >>>>>> + if (res->start >= SZ_1M) { >>>>>> + cmem->ranges[cmem->nr_ranges].start = res->start; >>>>>> + cmem->ranges[cmem->nr_ranges].end = res->end; >>>>>> + cmem->nr_ranges++; >>>>>> + } else if (res->end > SZ_1M) { >>>>>> + cmem->ranges[cmem->nr_ranges].start = SZ_1M; >>>>>> + cmem->ranges[cmem->nr_ranges].end = res->end; >>>>>> + cmem->nr_ranges++; >>>>>> + } >>>>> >>>>> What is going on with this chunk? I can guess but this needs a clear >>>>> comment. >>>> >>>> Indeed it needs some code comment, this is based on some offline >>>> discussion. cat /proc/vmcore will give a warning because ioremap is >>>> mapping the system ram. >>>> >>>> We pass the first 1M to kdump kernel in e820 as system ram so that 2nd >>>> kernel can use the low 1M memory because for example the trampoline >>>> code. >>>> >>>>> >>>>>> >>>>>> return 0; >>>>>> } >>>>> >>>>>> @@ -356,9 +337,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) >>>>>> memset(&cmd, 0, sizeof(struct crash_memmap_data)); >>>>>> cmd.params = params; >>>>>> >>>>>> - /* Add first 640K segment */ >>>>>> - ei.addr = image->arch.backup_src_start; >>>>>> - ei.size = image->arch.backup_src_sz; >>>>>> + /* >>>>>> + * Add the low memory range[0x1000, SZ_1M], skip >>>>>> + * the first zero page. >>>>>> + */ >>>>>> + ei.addr = PAGE_SIZE; >>>>>> + ei.size = SZ_1M - PAGE_SIZE; >>>>>> ei.type = E820_TYPE_RAM; >>>>>> add_e820_entry(params, &ei); >>>>> >>>>> Likewise here. Why do we need a special case? >>>>> Why the magic with PAGE_SIZE? >>>> >>>> Good catch, the zero page part is useless, I think no other special >>>> reason, just assumed zero page is not usable, but it should be ok to >>>> remove the special handling, just pass 0 - 1M is good enough. >>> >>> But if we have stopped special casing the low 1M. Why do we need a >>> special case here at all? >>> >> Here, need to pass the low memory range to kdump kernel, which will guarantee >> the availability of low memory in kdump kernel, otherwise, kdump kernel won't >> use the low memory region. >> >>> If you need the special case it is almost certainly wrong to say you >>> have ram above 640KiB and below 1MiB. That is the legacy ROM and video >>> MMIO area. >>> >>> There is a reason the original code said 640KiB. >>> >> Do you mean that the 640k region is good enough here instead of 1MiB? > > Reading through the code of crash_setup_memap_entries I see that what > the code is doing now. The code is repeating the e820 memory map with > the memory areas that were not reserved for the crash kernel removed. > > In which case what the code needs to be doing something like: > > cmd.type = E820_TYPE_RAM; > flags = IORESOURCE_MEM; > walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, 1024*1024, &cmd, > memmap_entry_callback); > The above code does not get the results what we expected, it gets the reserved memory marked as 'IORES_DESC_RESERVED' in the low 1MiB range. Finally, kdump kernel happened the panic as follow: ...... [ 3.555662] Kernel panic - not syncing: Real mode trampoline was not allocated [ 3.556660] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc3+ #4 [ 3.556660] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018 [ 3.556660] Call Trace: [ 3.556660] dump_stack+0x46/0x60 [ 3.556660] panic+0xfb/0x2d7 [ 3.556660] ? hv_init_spinlocks+0x7f/0x7f [ 3.556660] init_real_mode+0x27/0x1fa [ 3.556660] ? hv_init_spinlocks+0x7f/0x7f [ 3.556660] ? do_one_initcall+0x46/0x1e4 [ 3.556660] ? proc_register+0xd0/0x130 [ 3.556660] ? kernel_init_freeable+0xe2/0x242 [ 3.556660] ? rest_init+0xaa/0xaa [ 3.556660] ? kernel_init+0xa/0x106 [ 3.556660] ? ret_from_fork+0x22/0x40 [ 3.556660] Rebooting in 10 seconds.. [ 3.556660] ACPI MEMORY or I/O RESET_REG. I modified the above code, and tested it. This can find out the system ram in the low 1MiB range. And it worked well. @@ -356,11 +338,11 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) memset(&cmd, 0, sizeof(struct crash_memmap_data)); cmd.params = params; + /* Add the low 1MiB */ + cmd.type = E820_TYPE_RAM; + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + walk_iomem_res_desc(IORES_DESC_NONE, flags, 0, 1024*1024 - 1, &cmd, + memmap_entry_callback); > Depending on which bugs exist it might make sense to limit this to > the low 640KiB. But finding something the kernel already recognizes > as RAM should prevent most of those problems already. Barring bugs > I admit it doesn't make sense to repeat the work that someone else > has already done. > > This bit: > /* Add e820 reserved ranges */ > cmd.type = E820_TYPE_RESERVED; > flags = IORESOURCE_MEM; > walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd, > memmap_entry_callback); > > Should probably start at 1MiB instead of 0. Just so we don't report the If so, it can not find out the reserved memory marked as 'IORES_DESC_RESERVED' in the low 1MiB range, finally, it doesn't pass the reserved memory in the low 1MiB to kdump kernel, which could cause some problems, such as SME or PCI MMCONFIG issue. > memory below 1MiB as unconditionally reserved. I don't properly > understand the IORES_DESC_RESERVED flag, and how that differs from I found three commits about 'IORES_DESC_RESERVED' flag, hope this helps. 1.ae9e13d621d6 ("x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED") 2.5da04cc86d12 ("x86/mm: Rework ioremap resource mapping determination") 3.980621daf368 ("x86/crash: Add e820 reserved ranges to kdump kernel's e820 table") > flags. So please test my suggestions to verify the code works as > expected. > I have tested the two changes that you mentioned, please refer to the reply above. Thanks. Lianbo > Eric > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec