Hi Ming, thank you for the detailed description. On 2024/10/08 17:43, Ming Wang wrote: >> On 2024/08/14 11:43, Ming Wang wrote: >>> When iterating through LOAD segments to exclude pages in >>> `exclude_nodata_pages`, simply using `phys_end` to track allocated >>> PFNs can lead to the erroneous exclusion of valid pages from a >>> subsequent LOAD segment if its `phys_start` overlaps with the >>> previous segment's address range. >>> >>> This patch addresses the issue by checking for such overlaps with the >>> next LOAD segment's physical address range. If an overlap is >>> detected, we continue including PFNs from the current segment until >>> reaching the end of the overlapping region. >>> >>> This fix ensures that all valid pages within overlapping LOAD segments >>> are correctly included during PFN exclusion. >> >> I have a few questions: >> >> - I heard from Masa that the overlap is caused by a bug, isn't it fixed >> on the kernel side? or it will be fixed but is this patch for existing >> kernels that have the bug? > The overlap you mentioned is not caused by a kernel bug and is not expected to be fixed in the kernel. I see. >> >> - exclude_nodata_pages() is for dumpfiles created by "makedumpfile -E", >> which uses p_filesz in program headers to record existing pages after >> exclusion. The function excludes pages only between >> phys_start+file_size and phys_end, so if p_filesz == p_memsz, it will >> not exclude any pages. > My system's page size is 16K, Consider the following LOAD segment information > > phys_start phys_end virt_start virt_end > LOAD[ 0] 200000 2186200 9000000000200000 9000000002186200 > LOAD[ 1] 200000 ee00000 9000000000200000 900000000ee00000 > > I added some print statements within the exclude_nodata_pages function: > > while (pfn < pfn_end) { > ERRMSG("\nfunc:%s: pfn: %#llx phys_start: %#llx phys_end: %#llx file_size: %lu pfn_end:%#llx\n" > , __func__, pfn, phys_start, phys_end, (unsigned long)file_size, pfn_end); > clear_bit_on_2nd_bitmap(pfn, cycle); > ++pfn; > } > > The following output was observed: > > func:exclude_nodata_pages: pfn: 0x861 phys_start: 0x200000 phys_end: 0x2186200 file_size: 33055232 pfn_end:0x862 > > After this print statement, clear_bit_on_2nd_bitmap is called, effectively clearing the bitmap bit for pfn: 0x861. > However, pfn: 0x861 is a valid PFN that happens to store the kernel's init_uts_ns. This leads to an error when the > crash tool attempts to detect the kernel version: > > read_diskdump: PAGE_EXCLUDED: paddr/pfn: 2185d98/861 > crash: page excluded: kernel virtual address: 9000000002185d98 type: "init_uts_ns" So the calculated pfn:0x861 is wrong? > LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000 > 0x0000000002550400 0x0000000002550400 RWE 0x0 This is a different example, it's expected that p_memsz equals to p_filesz. But, the sizes are not aligned to the page size 16k, and pfn = paddr_to_pfn(phys_start + file_size); it looks like this rounds down the last valid page..? so - pfn = paddr_to_pfn(phys_start + file_size); + pfn = paddr_to_pfn(roundup(phys_start + file_size, PAGESIZE()); pfn_end = paddr_to_pfn(roundup(phys_end, PAGESIZE())); How about this instead of your patch? Thanks, Kazu >> I would like to check how they are on your machine, so could I have a >> "readelf -l /proc/vmcore" output? > The output of the command readelf -l /proc/vmcore executed on my system is provided below. > > It is important to note that this output corresponds to a different kernel version compared to the one used in the preceding example: > > [root@localhost ~]# readelf -l /proc/vmcore > > Elf file type is CORE (Core file) > Entry point 0x0 > There are 8 program headers, starting at offset 64 > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > NOTE 0x0000000000004000 0x0000000000000000 0x0000000000000000 > 0x0000000000004a54 0x0000000000004a54 0x4 > LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000 > 0x0000000002550400 0x0000000002550400 RWE 0x0 > LOAD 0x0000000002560000 0x9000000000200000 0x0000000000200000 > 0x000000000ec00000 0x000000000ec00000 RWE 0x0 > LOAD 0x0000000011160000 0x9000000090400000 0x0000000090400000 > 0x0000000011200000 0x0000000011200000 RWE 0x0 > LOAD 0x0000000022360000 0x90000000f8e00000 0x00000000f8e00000 > 0x00000000000c0000 0x00000000000c0000 RWE 0x0 > LOAD 0x0000000022420000 0x90000000f8ed0000 0x00000000f8ed0000 > 0x0000000004c70000 0x0000000004c70000 RWE 0x0 > LOAD 0x0000000027090000 0x90000000fe100000 0x00000000fe100000 > 0x0000000f81f00000 0x0000000f81f00000 RWE 0x0 > LOAD 0x0000000fa8f90000 0x9000100080000000 0x0000100080000000 > 0x0000001000000000 0x0000001000000000 RWE 0x0 >> >> - Do you mean that the bug always puts the two LOAD segments that have >> the same phys_start in a row? > Yes, as demonstrated in the example above. > > Thanks, > Ming _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec