Jay Lan wrote: > Simon Horman wrote: >> On Wed, Sep 03, 2008 at 02:01:59PM -0700, Jay Lan wrote: >>> Sometimes the kexec would allocate not enough memory for kdump kernel >>> itself on IA64 and caused kdump kernel to panic at boot. >>> >>> When it happens, the /proc/iomem would show a kernel RAM segment like >>> this: >>> 3014000000-3015294fff : System RAM >>> 3014000000-3014823ccf : Kernel code >>> 3014823cd0-3014dee8ef : Kernel data >>> 3014dee8f0-301529448f : Kernel bss >>> 3015295000-307bffdfff : System RAM >>> 3018000000-3037ffffff : Crash kernel >>> >>> But kexec would allocate memory 3018000000-3019290000 for the kernel, >>> which is 0x5000 smaller than the regular kernel. In my cases, the >>> physical_node_map and kern_memmap of the kdump kernel overlaped and >>> caused data corruption. >>> >>> This patch fixes the problem. The patch was generated against >>> kexec-tools 2.0.0 and tested in 2.6.27-rc4. >> Hi Jay, >> >> I am unclear about why this underallocation occurs. > > Hi Simon, > > The routine add_loaded_segments_info() set up "loaded_segment" array > that is needed by purgatory code, based on data stored in the > mem_ehdr array passed in as the second parameter. > > Upon entrance of the routine, the crash_memory_range[] contains > information about the regular kernel: > crash_memory_range[ 0]: start= 3000080000, end= 30003fffff > crash_memory_range[ 1]: start= 3003000000, end= 3005ffffff > crash_memory_range[ 2]: start= 3006000000, end= 3013ffffff > crash_memory_range[ 3]: start= 3014000000, end= 3015294fff > > The #3 entry is the kernel memory segment. > > And the mem_ehdr array would contain data as such: Hi, It should be mem_phdr, got it from mem_ehdr->e_phdr. > i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1 > i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1 > i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1 > i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4 Does anyone understand how the array were created and why there was a gap between i=0 and i=1 entries? I think this is the problem but i do not know how to fix it, so tried to work around it. The statement my patch replaced was totally broken: - if (loaded_segments[loaded_segments_num].end != - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) - break; + if (loaded_segments[loaded_segments_num].end < + (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) + loaded_segments[loaded_segments_num].end + = phdr->p_paddr & ~(ELF_PAGE_SIZE-1); My debugging showed that when "loaded_segments[loaded_segments_num].end" != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal and continue to next statement. However, if i assign both expression to local variables and do comparison, the 'break' statement is executed correctly when two values are not the same. Unfortunately, consequently the kdump kernel would _alawys_ hang. I believe the intent of the original statement is to ensure there is no gap between entries of mem_phdr array. But if there is a gap, kexec should simply exit with failure. The 'break' statement just created a loaded_segment[] array that broke the kernel memory segment into multiple entries and resulted in the kdump kernel hang in find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in some cases today are simply out of luck. I believe the real fix is to fix the contents of the mem_phdr array. Since i do not know how to fix it, my patch would close up the gap where there is the a gap between entries of the mem_phdr array. Does it make more sense to you now, Simon? Regards, - jay > > The code wants the new loaded_segments contain starting address > all aligned at page boundary, which is 0x10000 in IA64. > > Note that the p_memsz of mem_ehdr does not match to entries in > /proc/iomem: > 3014000000-3015294fff : System RAM > 3014000000-3014823ccf : Kernel code > 3014823cd0-3014dee8ef : Kernel data > 3014dee8f0-301529448f : Kernel bss > > The original code of add_loaded_segments_info() would go through > the mem_ehdr array and use the p_paddr of the first entry (the > beginning of the reserved memory) as the start address, add > the p_memsz of three entries to calculate the end address of > the kernel segment. > > But the p_paddr of i=0 plus p_memsz of i=0 should result in > 3018d10000 as the p_paddr of i=1 entry, but actually the > p_paddr of i=1 is 3018d20000. The logic of that routine > can not explain the discrepency. > > So, where the data of mem_ehdr array come from? > > add_loaded_segments_info > <- load_crashdump_segments > <- elf_ia64_load > <- file_type[i].load > <- my_load > > The elf_ia64_load set up mem_ehdr, probabaly based on data > pointed by *buf, which i think comes from vmlinuz. > > So, i failed to find out how the p_memsz were set up initially. > But, i think we did it the way too complicated, IMHO. > > The crash_memory_range[] array showed the kernel segment consumed > 0x1295000 bytes of memory and we only need to tell the purgatory > code to reserve that amount of memory. The logic in > add_loaded_segments_info() came out with 0x1290000 and caused the > crashkernel to panic on boot. > > Hmmm, as i types now, i may not consider the situation where > the crashkernel is not the same as the first kernel... > > Note that the underallocation does not _ALWAYS_ happen! It depends > on the vmlinux we build. Honestly i do not understand some part of > the kexec-tools code well enough to make major surgery to the code. > So, i just compare the end address after calculation of i=0 entry > of mem_ehdr array with the start address of the second entry. If it > is too small, i just bring it up to align with the start address of > the second entry. I am happy to allocate one extra page, may not be > needed in some cases, of memory than to panic. Yes, my patch is > a work-around. > > If you can find the true cause of the problem and fix it, it > would be great and appreciated! > > Regards, > - jay > > >>> Signed-off-by: Jay Lan <jlan at sgi.com> >>> >>> --- >>> kexec/arch/ia64/crashdump-ia64.c | 8 ++++---- >>> 1 file changed, 4 insertions(+), 4 deletions(-) >>> >>> Index: kexec-tools/kexec/arch/ia64/crashdump-ia64.c >>> =================================================================== >>> --- kexec-tools.orig/kexec/arch/ia64/crashdump-ia64.c 2008-09-03 11:24:14.289758063 -0700 >>> +++ kexec-tools/kexec/arch/ia64/crashdump-ia64.c 2008-09-03 11:29:34.095833316 -0700 >>> @@ -90,15 +90,15 @@ static void add_loaded_segments_info(str >>> phdr = &ehdr->e_phdr[i]; >>> if (phdr->p_type != PT_LOAD) >>> break; >>> - if (loaded_segments[loaded_segments_num].end != >>> - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) >>> - break; >>> + if (loaded_segments[loaded_segments_num].end < >>> + (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) >>> + loaded_segments[loaded_segments_num].end >>> + = phdr->p_paddr & ~(ELF_PAGE_SIZE-1); >>> loaded_segments[loaded_segments_num].end += >>> (phdr->p_memsz + ELF_PAGE_SIZE - 1) & >>> ~(ELF_PAGE_SIZE - 1); >>> i++; >>> } >>> - >>> loaded_segments_num++; >>> } >>> } > > _______________________________________________ > kexec mailing list > kexec at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec