On Fri, Sep 12, 2008 at 05:38:09PM -0700, Jay Lan wrote: > Simon Horman wrote: > >> Hi, > >> > >> It should be mem_phdr, got it from mem_ehdr->e_phdr. > >> > >>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1 > >>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1 > >>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1 > >>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4 > >> Does anyone understand how the array were created and why there > >> was a gap between i=0 and i=1 entries? I think this is the problem > >> but i do not know how to fix it, so tried to work around it. > >> > >> The statement my patch replaced was totally broken: > >> - if (loaded_segments[loaded_segments_num].end != > >> - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) > >> - break; > >> + if (loaded_segments[loaded_segments_num].end < > >> + (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) > >> + loaded_segments[loaded_segments_num].end > >> + = phdr->p_paddr & ~(ELF_PAGE_SIZE-1); > >> > >> My debugging showed that when "loaded_segments[loaded_segments_num].end" > >> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal > >> and continue to next statement. However, if i assign both expression > >> to local variables and do comparison, the 'break' statement is > >> executed correctly when two values are not the same. Unfortunately, > >> consequently the kdump kernel would _alawys_ hang. > >> > >> I believe the intent of the original statement is to ensure there is > >> no gap between entries of mem_phdr array. But if there is a gap, > >> kexec should simply exit with failure. The 'break' statement just > >> created a loaded_segment[] array that broke the kernel memory segment > >> into multiple entries and resulted in the kdump kernel hang in > >> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in > >> some cases today are simply out of luck. > >> > >> I believe the real fix is to fix the contents of the mem_phdr array. > >> Since i do not know how to fix it, my patch would close up the > >> gap where there is the a gap between entries of the mem_phdr array. > >> > >> Does it make more sense to you now, Simon? > > > > Hi Jay, > > > > yes that does make sense. I'd like to poke around and see > > if mem_phdr can be fixed. > > I think the whole ehdr is read from the kernel binary in > slurp_decompress_file. > > Bernhard reported a kdump kernel boot problem caused by a patch > regarding per-cpu variables access in early boot code: > http://article.gmane.org/gmane.linux.ports.ia64/19380 > > I backed out the offending patch and i was no longer able to > reproduce this problem. > > So, it is safe to say the problem was due to how we process > data from the vmlinuz. > > The code i tried to change: > - if (loaded_segments[loaded_segments_num].end != > - phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) > - break; > has two problems: > 1) '!=' operation takes precedence over '&'. If the code is to > do what it intends to do, the statement should be: > if (loaded_segments[loaded_segments_num].end != > (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) ) > break; Good point. That is definately a problem. > 2) When the 'break' is really executed, you breaks the kernel > segment into multiple segments. I also agree that is a problem. I just wonder what is the best thing to do about it. > The code needs fix even if the problem i saw was a result of > a bug in the kernel. Agreed. For now can you make a patch that just fixes precedence logic bug? Lets merge that and then tackle the more complex broken segment issue. -- Simon Horman VA Linux Systems Japan K.K., Sydney, Australia Satellite Office H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en