[PATCH 18/19] arm64: kdump: update a kernel doc

takahiro.akashi@xxxxxxxxxx (AKASHI Takahiro) · Wed, 20 Jan 2016 14:25:07 +0900

On 01/19/2016 11:01 PM, Mark Rutland wrote:
> On Tue, Jan 19, 2016 at 09:45:53PM +0800, Dave Young wrote:
>> On 01/19/16 at 12:51pm, Mark Rutland wrote:
>>> On Tue, Jan 19, 2016 at 08:28:48PM +0800, Dave Young wrote:
>>>> On 01/19/16 at 02:35pm, AKASHI Takahiro wrote:
>>>>> On 01/19/2016 10:43 AM, Dave Young wrote:
>>>>>> X86 takes another way in latest kexec-tools and kexec_file_load, that is
>>>>>> recreating E820 table and pass it to kexec/kdump kernel, if the entries
>>>>>> are over E820 limitation then turn to use setup_data list for remain
>>>>>> entries.
>>>>>
>>>>> Thanks. I will visit x86 code again.
>>>>>
>>>>>> I think it is X86 specific. Personally I think device tree property is
>>>>>> better.
>>>>>
>>>>> Do you think so?
>>>>
>>>> I'm not sure it is the best way. For X86 we run into problem with
>>>> memmap= design, one example is pci domain X (X>1) need the pci memory
>>>> ranges being passed to kdump kernel. When we passed reserved ranges in /proc/iomem
>>>> to 2nd kernel we find that cmdline[] array is not big enough.
>>>
>>> I'm not sure how PCI ranges relate to the memory map used for normal
>>> memory (i.e. RAM), though I'm probably missing some caveat with the way
>>> ACPI and UEFI describe PCI. Why does memmap= affect PCI memory?
>>
>> Here is the old patch which was rejected in kexec-tools:
>> http://lists.infradead.org/pipermail/kexec/2013-February/007924.html
>>
>>>
>>> If the kernel got the rest of its system topology from DT, the PCI
>>> regions would be described there.
>>
>> Yes, if kdump kernel use same DT as 1st kernel.
>
> Other than for testing purposes, I don't see why you'd pass the kdump
> kernel a DTB inconsistent with that the 1st kernel was passsed (other
> than some proerties under /chosen).
>
> We added /sys/firmware/fdt specifically to allow the kexec tools to get
> the exact DTB the first kernel used. There's no reason for tools to have
> to make something up.

Currently, arm64 kexec-tools modifies only a cmdline property in dtb
to pass a "elfcorehdr=" parameter as well as other restrictions (like maxcpus=1).

>>>> Do you think for arm64 only usable memory is necessary to let kdump kernel
>>>> know? I'm curious about how arm64 kernel get all memory layout from boot loader,
>>>> via UEFI memmap?
>>>
>>> When booted via EFI, we use the EFI memory map. The EFI stub handles
>>> acquring the relevant information and passing that to the first kernel
>>> in the DTB (see Documentation/arm/uefi.txt).
>>
>> Ok, thanks for the pointer. So in dt we are just have uefi memmap infomation
>> instead of memory nodes details..
>
> When booted via EFI, yes.
>
> For NUMA topology in !ACPI kernels, we might need to also retain and
> parse memory nodes, but only for toplogy information. The kernel would
> still only use memory as described by the EFI memory map.
>
> There's a horrible edge case I've spotted if performing a chain of
> cross-endian kexecs: LE -> BE -> LE, as the BE kernel would have to
> respect the EFI memory map so as to avoid corrupting it for the
> subsequent LE kernel. Other than this I believe everything should just
> work.

BE kernel doesn't support UEFI yet and cannot access UEFI memmap table. So,
for LE -> BE, we don't use a dtb generated from /sys/firmware/fdt (or /proc/device-tree)
(as in the case of LE -> LE) and require users to provide a dtb file explicitly.

For BE -> LE, BE kernel doesn't know wther UEFI memmap table is available or not
and so use the same (explicitly-provided) dtb (as LE -> LE in !UEFI)

>>> A kexec'd kernel should simply inherit that. So long as the DTB and/or
>>> UEFI tables in memory are the same, it would be the same as a cold boot.
>>
>> For kexec all memory ranges are same, for kdump we need use original reserved
>> range with crashkernel= as usable memory and all other orignal usable ranges
>> are not usable anymore.
>
> Sure. This is what I believe we should expose with an additional
> property under /chosen, while keeping everything else pristine.
>
> The crash kernel can then limit itself to that region, while it would
> have the information of the full memory map (which it could log and/or
> use to drive other dumping).

FYI,
all the original usable memory regions used by the 1st kernel are also
described in an ELF core header specified by "elfcorehdr=" parameter to
the crash dump kernel.

-Takahiro AKASHI

>> Is it possible to modify uefi memmap for kdump case?
>
> Technically it would be possible, however I don't think it's necessary,
> and I think it would be disadvantageous to do so.
>
> Describing the range(s) the crash kernel can use in separate properties
> under /chosen has a number of advantages.
>
>>> In the !EFI case, we use the memory nodes in the DTB. Only in this case
>>> could usable-memory properties in memory nodes make sense. I'd prefer a
>>> uniform property under /chosen for both cases.
>>
>> We stil use same DTB, need to modify the DT and update the usable and unusable
>> nodes for kdump?
>
> We'd have a (slightly) modified DTB that contained additional properties
> describing the range(s) reserved for use by the crash kernel.
>
> Other than those properties under /chosen (e.g. the command line, initrd
> pointers if any), it would be the original DTB.
>
> Thanks,
> Mark.
>