Re: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump

Andy Whitcroft <apw@xxxxxxxxxxxx> · Fri, 27 Oct 2006 10:15:16 +0100

Zou, Nanhai wrote:
> 
>> -----Original Message-----
>> From: linux-ia64-owner@xxxxxxxxxxxxxxx
>> [mailto:linux-ia64-owner@xxxxxxxxxxxxxxx] On Behalf Of Zou, Nanhai
>> Sent: 2006年10月27日 9:41
>> To: Mel Gorman; Horms
>> Cc: linux-ia64@xxxxxxxxxxxxxxx; Linus Torvalds; Bob Picco; Andrew Morton; Dave
>> Hansen; Andy Whitcroft; Andi Kleen; Benjamin Herrenschmidt; Paul Mackerras;
>> Keith Mannthey; Luck, Tony; KAMEZAWA Hiroyuki; Yasunori Goto; Khalid Aziz
>> Subject: RE: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>
>>
>>> -----Original Message-----
>>> From: Mel Gorman [mailto:mel@xxxxxxxxx]
>>> Sent: 2006年10月26日 21:27
>>> To: Horms
>>> Cc: linux-ia64@xxxxxxxxxxxxxxx; Linus Torvalds; Bob Picco; Andrew Morton;
>> Dave
>>> Hansen; Andy Whitcroft; Andi Kleen; Benjamin Herrenschmidt; Paul Mackerras;
>>> Keith Mannthey; Luck, Tony; KAMEZAWA Hiroyuki; Yasunori Goto; Zou, Nanhai;
>>> Khalid Aziz
>>> Subject: Re: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>>
>>> From mel@xxxxxxxxx Thu Oct 26 14:10:39 2006
>>> Date: Thu, 26 Oct 2006 14:10:39 +0100 (IST)
>>> From: Mel Gorman <mel@xxxxxxxxx>
>>> To: Andy Whitcroft <apw@xxxxxxxxxxxx>
>>> Subject: Re: 05e0caad3b7bd0d0fbeff980bca22f186241a501 breaks ia64 kdump
>>>
>>> On Thu, 26 Oct 2006, Horms wrote:
>>>
>>>> Hi,
>>>>
>>>> After doing a bit of research it seems that ia64 kdump is broken
>>>> by 05e0caad3b7bd0d0fbeff980bca22f186241a501, which appeared between
>>>> 2.6.18 and 2.6.19-rc3. I can be more specific about the version if
>>>> need be, but here is the commit log from Linus' tree.
>>>>
>>> Ok, Andy Whitcroft and I both took a few kicks at this problem to see what
>>> the story was. My current understanding (given to me by Andy) with kdump is
>>> this
>>>
>>> 1. Normal kernel boots and leaves a kdump hole in memory somewhere
>>> 2. In the kdump hole, a crash dump kernel is loaded
>>> 3. Things run happily for a while until something goes wrong. kexec is
>>>     called on the kernel image in the kdump hole
>>> 4. kdump kernel starts and creates an image
>>>
>>> Grand so far.
>>>
>>> Now, with arch-independent zone-sizing, an architecture states where "real"
>>> memory is and memmap is initialised in those ranges.
>>>
>>> The maps of the two kernels look like this
>>>
>>> Normal Kernel
>>>> early_node_map[7] active PFN ranges
>>>>    0:     1025 ->     4096
>>>>    0:     4567 ->    16384
>>>>    0:    32768 ->   125911
>>>>    0:   126514 ->   127540
>>>>    0:   127541 ->   128557
>>>>    0:   128576 ->   130688
>>>>    0:   130984 ->   130998
>>> Crash kernel
>>>> early_node_map[7] active PFN ranges
>>>>    0:    16855 ->    16856
>>>>    0:    16857 ->    32096
>>>>    0:    32752 ->    32753
>>>>    0:    32754 ->    32755
>>>>    0:    32756 ->    32757
>>>>    0:    32758 ->    32761
>>>>    0:    32762 ->    32768
>>> So, there is clearly a hole there between 16384 -> 32768 for the kdump hole
>>> in the normal kernel. I expect the kernel image and __init sections are
>>> located at PFN 16384.
>>>
>>> The problem is that the crash kernel is reporting that memory starts at
>>> 16855, a gap of 471 page frames! memmap will not be initialised here because
>>> it "doesn't exist" even though the memmap will be allocated because of
>>> MAX_ORDER-alignment issues
>>>
>>> The first fault looks like this
>>>
>>>> page:a0007ffffff23598 flags:0x0000000000000000 mapping:0000000000000000
>>>> mapcount:1 count:0
>>> Based on the value of virtual mem_map, that is at PFN 16629 or about 245
>>> page frames into the kernel image. In the stack trace, you see
>>> free_initmem() is being called. i.e. the __init section appears in a memory
>>> hole where memmap was never initialised.
>>>
>>> I haven't looked at how kdump works yet, but you are either supplying a fake
>>> EFI map that omits the kernel image or else you only read a portion of the
>>> EFI when booting a crash kernel and start reading after the kernel image
>>> ends. If the EFI covers the kernel image, you'll see an entry like this in
>>> the early_node_map
>>>
>>> 0: 16384 -> 16855
>>>
>>> and that bad_page() will disappear.
>>>
>>> We'll start kicking at the kdump patches now, but maybe a kdump expert can
>>> tell offhand why the crash kernel's EFI map does not cover the kernel image.
>>>
>> EFI memmap is changed in purgatory code.
>> I mark old EFI memmap entry with attribute EFI_LOADER_DATA as
>> EFI_CONVENTIONAL_MEMORY, then mark the range of crash kernel image as
>> EFI_LOADER_DATA. During this some EFI memmap range may be split, but the entire
>> layout is not changed.
>>
>> I am building 2.6.19-rc3 to see if I can reproduce the issue.
>>
>> Thanks
>> Zou Nan hai
>> -
> 
> 
> Hi Neil,
>  I can't reproduce the issue with 2.6.19-rc3
>  Is there any special config option to reproduce it?

Hi,

Mel and I spent a bit more time thinking about this.  If the efi map is
being modified such that the kernel area becomes loader data that may
well move up the start of conventional memory as far as the running
kernel is concerned.

[Apologies in advance if Zou is not an appropriate name.]

It would be helpful both to have a dump of the efi map and the start
address of the kernel from the affected machine (Horms) and if possible
the efi map and start address of the kernel from your working test
platform (Zou).  Also, if we could get the boot logs from a conventional
kernel and the kexec kernel from the working test platform (Zou).

A logical next step might be to bodge things such that we offer up the
kernel image as an active range and see if that sorts out the alignment
issue we are seeing, this will allow us to be certain it is the kernel
image in this area.  Something like the following inserted into
register_memory() might work:

	add_active_range(0, code_resource.start >> PAGE_SHIFT,
			    data_resource.end >> PAGE_SHIFT);

Not sure this is the right thing as a fix, but would help confirm the
theory.

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html