Hi Kazu,
Thank you for your insightful analysis and suggestion!
在 2024/10/9 09:47, HAGIO KAZUHITO(萩尾 一仁) 写道:
Hi Ming,
thank you for the detailed description.
On 2024/10/08 17:43, Ming Wang wrote:
On 2024/08/14 11:43, Ming Wang wrote:
When iterating through LOAD segments to exclude pages in
`exclude_nodata_pages`, simply using `phys_end` to track allocated
PFNs can lead to the erroneous exclusion of valid pages from a
subsequent LOAD segment if its `phys_start` overlaps with the
previous segment's address range.
This patch addresses the issue by checking for such overlaps with the
next LOAD segment's physical address range. If an overlap is
detected, we continue including PFNs from the current segment until
reaching the end of the overlapping region.
This fix ensures that all valid pages within overlapping LOAD segments
are correctly included during PFN exclusion.
I have a few questions:
- I heard from Masa that the overlap is caused by a bug, isn't it fixed
on the kernel side? or it will be fixed but is this patch for existing
kernels that have the bug?
The overlap you mentioned is not caused by a kernel bug and is not expected to be fixed in the kernel.
I see.
- exclude_nodata_pages() is for dumpfiles created by "makedumpfile -E",
which uses p_filesz in program headers to record existing pages after
exclusion. The function excludes pages only between
phys_start+file_size and phys_end, so if p_filesz == p_memsz, it will
not exclude any pages.
My system's page size is 16K, Consider the following LOAD segment information
phys_start phys_end virt_start virt_end
LOAD[ 0] 200000 2186200 9000000000200000 9000000002186200
LOAD[ 1] 200000 ee00000 9000000000200000 900000000ee00000
I added some print statements within the exclude_nodata_pages function:
while (pfn < pfn_end) {
ERRMSG("\nfunc:%s: pfn: %#llx phys_start: %#llx phys_end: %#llx file_size: %lu pfn_end:%#llx\n"
, __func__, pfn, phys_start, phys_end, (unsigned long)file_size, pfn_end);
clear_bit_on_2nd_bitmap(pfn, cycle);
++pfn;
}
The following output was observed:
func:exclude_nodata_pages: pfn: 0x861 phys_start: 0x200000 phys_end: 0x2186200 file_size: 33055232 pfn_end:0x862
After this print statement, clear_bit_on_2nd_bitmap is called, effectively clearing the bitmap bit for pfn: 0x861.
However, pfn: 0x861 is a valid PFN that happens to store the kernel's init_uts_ns. This leads to an error when the
crash tool attempts to detect the kernel version:
read_diskdump: PAGE_EXCLUDED: paddr/pfn: 2185d98/861
crash: page excluded: kernel virtual address: 9000000002185d98 type: "init_uts_ns"
So the calculated pfn:0x861 is wrong?
Yes, it's wrong.
LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000
0x0000000002550400 0x0000000002550400 RWE 0x0
This is a different example, it's expected that p_memsz equals to p_filesz.
But, the sizes are not aligned to the page size 16k, and
pfn = paddr_to_pfn(phys_start + file_size);
it looks like this rounds down the last valid page..? so
- pfn = paddr_to_pfn(phys_start + file_size);
+ pfn = paddr_to_pfn(roundup(phys_start + file_size, PAGESIZE());
pfn_end = paddr_to_pfn(roundup(phys_end, PAGESIZE()));
How about this instead of your patch?
I agree that rounding up the calculated `pfn` to the nearest page
boundary is a better approach to ensure that the last valid page of a
LOAD segment is not accidentally excluded.
I've implemented your suggestion:
- pfn = paddr_to_pfn(phys_start + file_size);
+ pfn = paddr_to_pfn(roundup(phys_start + file_size, PAGESIZE()));
and tested it thoroughly. It effectively resolves the issue of valid
pages being excluded due to rounding errors.
I've updated the patch accordingly and will be send v2 soon.
Thanks again for your help!
Best regards,
Ming
Thanks,
Kazu
I would like to check how they are on your machine, so could I have a
"readelf -l /proc/vmcore" output?
The output of the command readelf -l /proc/vmcore executed on my system is provided below.
It is important to note that this output corresponds to a different kernel version compared to the one used in the preceding example:
[root@localhost ~]# readelf -l /proc/vmcore
Elf file type is CORE (Core file)
Entry point 0x0
There are 8 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
NOTE 0x0000000000004000 0x0000000000000000 0x0000000000000000
0x0000000000004a54 0x0000000000004a54 0x4
LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000
0x0000000002550400 0x0000000002550400 RWE 0x0
LOAD 0x0000000002560000 0x9000000000200000 0x0000000000200000
0x000000000ec00000 0x000000000ec00000 RWE 0x0
LOAD 0x0000000011160000 0x9000000090400000 0x0000000090400000
0x0000000011200000 0x0000000011200000 RWE 0x0
LOAD 0x0000000022360000 0x90000000f8e00000 0x00000000f8e00000
0x00000000000c0000 0x00000000000c0000 RWE 0x0
LOAD 0x0000000022420000 0x90000000f8ed0000 0x00000000f8ed0000
0x0000000004c70000 0x0000000004c70000 RWE 0x0
LOAD 0x0000000027090000 0x90000000fe100000 0x00000000fe100000
0x0000000f81f00000 0x0000000f81f00000 RWE 0x0
LOAD 0x0000000fa8f90000 0x9000100080000000 0x0000100080000000
0x0000001000000000 0x0000001000000000 RWE 0x0
- Do you mean that the bug always puts the two LOAD segments that have
the same phys_start in a row?
Yes, as demonstrated in the example above.
Thanks,
Ming
_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec