Re: [PATCH 3/3 v3] x86/kdump: clean up all the code related to the backup region

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2019年10月15日 19:04, Eric W. Biederman 写道:
> lijiang <lijiang@xxxxxxxxxx> writes:
> 
>> 在 2019年10月13日 11:54, Eric W. Biederman 写道:
>>> Dave Young <dyoung@xxxxxxxxxx> writes:
>>>
>>>> Hi Eric,
>>>>
>>>> On 10/12/19 at 06:26am, Eric W. Biederman wrote:
>>>>> Lianbo Jiang <lijiang@xxxxxxxxxx> writes:
>>>>>
>>>>>> When the crashkernel kernel command line option is specified, the
>>>>>> low 1MiB memory will always be reserved, which makes that the memory
>>>>>> allocated later won't fall into the low 1MiB area, thereby, it's not
>>>>>> necessary to create a backup region and also no need to copy the first
>>>>>> 640k content to a backup region.
>>>>>>
>>>>>> Currently, the code related to the backup region can be safely removed,
>>>>>> so lets clean up.
>>>>>>
>>>>>> Signed-off-by: Lianbo Jiang <lijiang@xxxxxxxxxx>
>>>>>> ---
>>>>>
>>>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>>>>> index eb651fbde92a..cc5774fc84c0 100644
>>>>>> --- a/arch/x86/kernel/crash.c
>>>>>> +++ b/arch/x86/kernel/crash.c
>>>>>> @@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>>>>>>  
>>>>>>  #ifdef CONFIG_KEXEC_FILE
>>>>>>  
>>>>>> -static unsigned long crash_zero_bytes;
>>>>>> -
>>>>>>  static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
>>>>>>  {
>>>>>>  	unsigned int *nr_ranges = arg;
>>>>>> @@ -234,9 +232,15 @@ static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
>>>>>>  {
>>>>>>  	struct crash_mem *cmem = arg;
>>>>>>  
>>>>>> -	cmem->ranges[cmem->nr_ranges].start = res->start;
>>>>>> -	cmem->ranges[cmem->nr_ranges].end = res->end;
>>>>>> -	cmem->nr_ranges++;
>>>>>> +	if (res->start >= SZ_1M) {
>>>>>> +		cmem->ranges[cmem->nr_ranges].start = res->start;
>>>>>> +		cmem->ranges[cmem->nr_ranges].end = res->end;
>>>>>> +		cmem->nr_ranges++;
>>>>>> +	} else if (res->end > SZ_1M) {
>>>>>> +		cmem->ranges[cmem->nr_ranges].start = SZ_1M;
>>>>>> +		cmem->ranges[cmem->nr_ranges].end = res->end;
>>>>>> +		cmem->nr_ranges++;
>>>>>> +	}
>>>>>
>>>>> What is going on with this chunk?  I can guess but this needs a clear
>>>>> comment.
>>>>
>>>> Indeed it needs some code comment, this is based on some offline
>>>> discussion.  cat /proc/vmcore will give a warning because ioremap is
>>>> mapping the system ram.
>>>>
>>>> We pass the first 1M to kdump kernel in e820 as system ram so that 2nd
>>>> kernel can use the low 1M memory because for example the trampoline
>>>> code.
>>>>
>>>>>
>>>>>>  
>>>>>>  	return 0;
>>>>>>  }
>>>>>
>>>>>> @@ -356,9 +337,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
>>>>>>  	memset(&cmd, 0, sizeof(struct crash_memmap_data));
>>>>>>  	cmd.params = params;
>>>>>>  
>>>>>> -	/* Add first 640K segment */
>>>>>> -	ei.addr = image->arch.backup_src_start;
>>>>>> -	ei.size = image->arch.backup_src_sz;
>>>>>> +	/*
>>>>>> +	 * Add the low memory range[0x1000, SZ_1M], skip
>>>>>> +	 * the first zero page.
>>>>>> +	 */
>>>>>> +	ei.addr = PAGE_SIZE;
>>>>>> +	ei.size = SZ_1M - PAGE_SIZE;
>>>>>>  	ei.type = E820_TYPE_RAM;
>>>>>>  	add_e820_entry(params, &ei);
>>>>>
>>>>> Likewise here.  Why do we need a special case?
>>>>> Why the magic with PAGE_SIZE?
>>>>
>>>> Good catch, the zero page part is useless, I think no other special
>>>> reason, just assumed zero page is not usable, but it should be ok to
>>>> remove the special handling, just pass 0 - 1M is good enough.
>>>
>>> But if we have stopped special casing the low 1M.  Why do we need a
>>> special case here at all?
>>>
>> Here, need to pass the low memory range to kdump kernel, which will guarantee
>> the availability of low memory in kdump kernel, otherwise, kdump kernel won't
>> use the low memory region.
>>
>>> If you need the special case it is almost certainly wrong to say you
>>> have ram above 640KiB and below 1MiB.  That is the legacy ROM and video
>>> MMIO area.
>>>
>>> There is a reason the original code said 640KiB.
>>>
>> Do you mean that the 640k region is good enough here instead of 1MiB?
> 
> Reading through the code of crash_setup_memap_entries I see that what
> the code is doing now.  The code is repeating the e820 memory map with
> the memory areas that were not reserved for the crash kernel removed.
> 
> In which case what the code needs to be doing something like:
> 
> 	cmd.type = E820_TYPE_RAM;
> 	flags = IORESOURCE_MEM;
> 	walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, 1024*1024, &cmd,
> 			       memmap_entry_callback);
> 
The above code does not get the results what we expected, it gets the reserved
memory marked as 'IORES_DESC_RESERVED' in the low 1MiB range.

Finally, kdump kernel happened the panic as follow:
......
[    3.555662] Kernel panic - not syncing: Real mode trampoline was not allocated
[    3.556660] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc3+ #4
[    3.556660] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.556660] Call Trace:
[    3.556660]  dump_stack+0x46/0x60
[    3.556660]  panic+0xfb/0x2d7
[    3.556660]  ? hv_init_spinlocks+0x7f/0x7f
[    3.556660]  init_real_mode+0x27/0x1fa
[    3.556660]  ? hv_init_spinlocks+0x7f/0x7f
[    3.556660]  ? do_one_initcall+0x46/0x1e4
[    3.556660]  ? proc_register+0xd0/0x130
[    3.556660]  ? kernel_init_freeable+0xe2/0x242
[    3.556660]  ? rest_init+0xaa/0xaa
[    3.556660]  ? kernel_init+0xa/0x106
[    3.556660]  ? ret_from_fork+0x22/0x40
[    3.556660] Rebooting in 10 seconds..
[    3.556660] ACPI MEMORY or I/O RESET_REG.

I modified the above code, and tested it. This can find out the system ram in
the low 1MiB range. And it worked well.

@@ -356,11 +338,11 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
        memset(&cmd, 0, sizeof(struct crash_memmap_data));
        cmd.params = params;
 
+       /* Add the low 1MiB */
+       cmd.type = E820_TYPE_RAM;
+       flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+       walk_iomem_res_desc(IORES_DESC_NONE, flags, 0, 1024*1024 - 1, &cmd,
+                       memmap_entry_callback);

> Depending on which bugs exist it might make sense to limit this to
> the low 640KiB.  But finding something the kernel already recognizes
> as RAM should prevent most of those problems already.  Barring bugs
> I admit it doesn't make sense to repeat the work that someone else
> has already done.
> 
> This bit:
> 	/* Add e820 reserved ranges */
> 	cmd.type = E820_TYPE_RESERVED;
> 	flags = IORESOURCE_MEM;
> 	walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
> 			   memmap_entry_callback);
> 
> Should probably start at 1MiB instead of 0.  Just so we don't report the
If so, it can not find out the reserved memory marked as 'IORES_DESC_RESERVED' in
the low 1MiB range, finally, it doesn't pass the reserved memory in the low 1MiB to
kdump kernel, which could cause some problems, such as SME or PCI MMCONFIG issue.

> memory below 1MiB as unconditionally reserved.   I don't properly
> understand the IORES_DESC_RESERVED flag, and how that differs from
I found three commits about 'IORES_DESC_RESERVED' flag, hope this helps.
1.ae9e13d621d6 ("x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED")
2.5da04cc86d12 ("x86/mm: Rework ioremap resource mapping determination")
3.980621daf368 ("x86/crash: Add e820 reserved ranges to kdump kernel's e820 table")

> flags.  So please test my suggestions to verify the code works as
> expected.
> 
I have tested the two changes that you mentioned, please refer to the reply above.

Thanks.
Lianbo

> Eric
> 

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux