On 01/20/2016 11:49 AM, Dave Young wrote: > On 01/19/16 at 02:01pm, Mark Rutland wrote: >> On Tue, Jan 19, 2016 at 09:45:53PM +0800, Dave Young wrote: >>> On 01/19/16 at 12:51pm, Mark Rutland wrote: >>>> On Tue, Jan 19, 2016 at 08:28:48PM +0800, Dave Young wrote: >>>>> On 01/19/16 at 02:35pm, AKASHI Takahiro wrote: >>>>>> On 01/19/2016 10:43 AM, Dave Young wrote: >>>>>>> X86 takes another way in latest kexec-tools and kexec_file_load, that is >>>>>>> recreating E820 table and pass it to kexec/kdump kernel, if the entries >>>>>>> are over E820 limitation then turn to use setup_data list for remain >>>>>>> entries. >>>>>> >>>>>> Thanks. I will visit x86 code again. >>>>>> >>>>>>> I think it is X86 specific. Personally I think device tree property is >>>>>>> better. >>>>>> >>>>>> Do you think so? >>>>> >>>>> I'm not sure it is the best way. For X86 we run into problem with >>>>> memmap= design, one example is pci domain X (X>1) need the pci memory >>>>> ranges being passed to kdump kernel. When we passed reserved ranges in /proc/iomem >>>>> to 2nd kernel we find that cmdline[] array is not big enough. >>>> >>>> I'm not sure how PCI ranges relate to the memory map used for normal >>>> memory (i.e. RAM), though I'm probably missing some caveat with the way >>>> ACPI and UEFI describe PCI. Why does memmap= affect PCI memory? >>> >>> Here is the old patch which was rejected in kexec-tools: >>> http://lists.infradead.org/pipermail/kexec/2013-February/007924.html >>> >>>> >>>> If the kernel got the rest of its system topology from DT, the PCI >>>> regions would be described there. >>> >>> Yes, if kdump kernel use same DT as 1st kernel. >> >> Other than for testing purposes, I don't see why you'd pass the kdump >> kernel a DTB inconsistent with that the 1st kernel was passsed (other >> than some proerties under /chosen). >> >> We added /sys/firmware/fdt specifically to allow the kexec tools to get >> the exact DTB the first kernel used. There's no reason for tools to have >> to make something up. > > Agreed, but kexec-tools has an option to pass in any dtb files. Who knows > how one will use it unless dropping the option and use /sys/firmware/fdt > unconditionally. As a matter of fact, specifying proper command line parameters as well as dtb is partly users' responsibility for kdump to work correctly. (especially for BE kernel) > If we choose to implement kexec_file_load only in kernel, the interfaces > provided are kernel, initrd and cmdline. We can always use same dtb. I would say that we can always use the same dtb even with kexec_load from user's perspective. Right? (The difference is whether changes are made by kernel itself or kexec-tools.) >> >>>>> Do you think for arm64 only usable memory is necessary to let kdump kernel >>>>> know? I'm curious about how arm64 kernel get all memory layout from boot loader, >>>>> via UEFI memmap? >>>> >>>> When booted via EFI, we use the EFI memory map. The EFI stub handles >>>> acquring the relevant information and passing that to the first kernel >>>> in the DTB (see Documentation/arm/uefi.txt). >>> >>> Ok, thanks for the pointer. So in dt we are just have uefi memmap infomation >>> instead of memory nodes details.. >> >> When booted via EFI, yes. >> >> For NUMA topology in !ACPI kernels, we might need to also retain and >> parse memory nodes, but only for toplogy information. The kernel would >> still only use memory as described by the EFI memory map. >> >> There's a horrible edge case I've spotted if performing a chain of >> cross-endian kexecs: LE -> BE -> LE, as the BE kernel would have to >> respect the EFI memory map so as to avoid corrupting it for the >> subsequent LE kernel. Other than this I believe everything should just >> work. > > Firmware do not know kernel endianniess, kernel should respect firmware > maps and adapt to it, it sounds like a generic issue not specfic to kexec. On arm64, a kernel image header has a bit field to specify the image's endianness. Anyway, our current implementation replies on a user-supplied dtb to start BE kernel. >> >>>> A kexec'd kernel should simply inherit that. So long as the DTB and/or >>>> UEFI tables in memory are the same, it would be the same as a cold boot. >>> >>> For kexec all memory ranges are same, for kdump we need use original reserved >>> range with crashkernel= as usable memory and all other orignal usable ranges >>> are not usable anymore. >> >> Sure. This is what I believe we should expose with an additional >> property under /chosen, while keeping everything else pristine. >> >> The crash kernel can then limit itself to that region, while it would >> have the information of the full memory map (which it could log and/or >> use to drive other dumping). > > In this way kernel should be aware it is a kdump booting, it is doable though > I feel it is better for kdump kernel in a black box with infomations it > can use just like the 1st kernel. Things here is where we choose to cook > the memory infomation in boot loader or in kernel itself. > >> >>> Is it possible to modify uefi memmap for kdump case? >> >> Technically it would be possible, however I don't think it's necessary, >> and I think it would be disadvantageous to do so. >> >> Describing the range(s) the crash kernel can use in separate properties >> under /chosen has a number of advantages. > > Ok, I got the points. We have a is_kdump_kernel() by checking if there is > elfcorehdr_addr kernel cmdline. This is mainly for some drivers which > do not work well in kdump kernel some uncertain reasons. But ideally I > think kernel should handle things just like in 1st kernel and avoid to use > it. So I'm not still sure about what are advantages of a property under /chosen over "memmap=" kernel parameter. Both are simple and can have the same effect with minimizing changes to dtb. (But if, in the latter case, we have to provide *all* the memory-related information through "memmap=" parameters, it would be much complicated.) -Takahiro AKASHI >> >>>> In the !EFI case, we use the memory nodes in the DTB. Only in this case >>>> could usable-memory properties in memory nodes make sense. I'd prefer a >>>> uniform property under /chosen for both cases. >>> >>> We stil use same DTB, need to modify the DT and update the usable and unusable >>> nodes for kdump? >> >> We'd have a (slightly) modified DTB that contained additional properties >> describing the range(s) reserved for use by the crash kernel. >> >> Other than those properties under /chosen (e.g. the command line, initrd >> pointers if any), it would be the original DTB. >> >> Thanks, >> Mark. > > Thanks > Dave >