Simon Horman wrote: >> Hi, >> >> It should be mem_phdr, got it from mem_ehdr->e_phdr. >> >>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1 >>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1 >>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1 >>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4 >> Does anyone understand how the array were created and why there >> was a gap between i=0 and i=1 entries? I think this is the problem >> but i do not know how to fix it, so tried to work around it. >> >> The statement my patch replaced was totally broken: >> - if (loaded_segments[loaded_segments_num].end != >> - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) >> - break; >> + if (loaded_segments[loaded_segments_num].end < >> + (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) >> + loaded_segments[loaded_segments_num].end >> + = phdr->p_paddr & ~(ELF_PAGE_SIZE-1); >> >> My debugging showed that when "loaded_segments[loaded_segments_num].end" >> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal >> and continue to next statement. However, if i assign both expression >> to local variables and do comparison, the 'break' statement is >> executed correctly when two values are not the same. Unfortunately, >> consequently the kdump kernel would _alawys_ hang. >> >> I believe the intent of the original statement is to ensure there is >> no gap between entries of mem_phdr array. But if there is a gap, >> kexec should simply exit with failure. The 'break' statement just >> created a loaded_segment[] array that broke the kernel memory segment >> into multiple entries and resulted in the kdump kernel hang in >> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in >> some cases today are simply out of luck. >> >> I believe the real fix is to fix the contents of the mem_phdr array. >> Since i do not know how to fix it, my patch would close up the >> gap where there is the a gap between entries of the mem_phdr array. >> >> Does it make more sense to you now, Simon? > > Hi Jay, > > yes that does make sense. I'd like to poke around and see > if mem_phdr can be fixed. I think the whole ehdr is read from the kernel binary in slurp_decompress_file. Bernhard reported a kdump kernel boot problem caused by a patch regarding per-cpu variables access in early boot code: http://article.gmane.org/gmane.linux.ports.ia64/19380 I backed out the offending patch and i was no longer able to reproduce this problem. So, it is safe to say the problem was due to how we process data from the vmlinuz. The code i tried to change: - if (loaded_segments[loaded_segments_num].end != - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) - break; has two problems: 1) '!=' operation takes precedence over '&'. If the code is to do what it intends to do, the statement should be: if (loaded_segments[loaded_segments_num].end != (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) break; 2) When the 'break' is really executed, you breaks the kernel segment into multiple segments. The code needs fix even if the problem i saw was a result of a bug in the kernel. Thanks, jay >