On 01/21/2016 09:02 PM, Mark Rutland wrote: > On Thu, Jan 21, 2016 at 03:53:42PM +0900, AKASHI Takahiro wrote: >> On 01/20/2016 08:49 PM, Mark Rutland wrote: >>> On Wed, Jan 20, 2016 at 03:07:53PM +0900, AKASHI Takahiro wrote: >>>> On 01/20/2016 11:49 AM, Dave Young wrote: >>>>> On 01/19/16 at 02:01pm, Mark Rutland wrote: >>>>>> On Tue, Jan 19, 2016 at 09:45:53PM +0800, Dave Young wrote: >>>>>>> On 01/19/16 at 12:51pm, Mark Rutland wrote: >>>>>>>> On Tue, Jan 19, 2016 at 08:28:48PM +0800, Dave Young wrote: >>>>>>>>> On 01/19/16 at 02:35pm, AKASHI Takahiro wrote: >>>>>>>>>> On 01/19/2016 10:43 AM, Dave Young wrote: >>>>>>>>>>> X86 takes another way in latest kexec-tools and kexec_file_load, that is >>>>>>>>>>> recreating E820 table and pass it to kexec/kdump kernel, if the entries >>>>>>>>>>> are over E820 limitation then turn to use setup_data list for remain >>>>>>>>>>> entries. >>>>>>>>>> >>>>>>>>>> Thanks. I will visit x86 code again. >>>>>>>>>> >>>>>>>>>>> I think it is X86 specific. Personally I think device tree property is >>>>>>>>>>> better. >>>>>>>>>> >>>>>>>>>> Do you think so? >>>>>>>>> >>>>>>>>> I'm not sure it is the best way. For X86 we run into problem with >>>>>>>>> memmap= design, one example is pci domain X (X>1) need the pci memory >>>>>>>>> ranges being passed to kdump kernel. When we passed reserved ranges in /proc/iomem >>>>>>>>> to 2nd kernel we find that cmdline[] array is not big enough. >>>>>>>> >>>>>>>> I'm not sure how PCI ranges relate to the memory map used for normal >>>>>>>> memory (i.e. RAM), though I'm probably missing some caveat with the way >>>>>>>> ACPI and UEFI describe PCI. Why does memmap= affect PCI memory? >>>>>>> >>>>>>> Here is the old patch which was rejected in kexec-tools: >>>>>>> http://lists.infradead.org/pipermail/kexec/2013-February/007924.html >>>>>>> >>>>>>>> >>>>>>>> If the kernel got the rest of its system topology from DT, the PCI >>>>>>>> regions would be described there. >>>>>>> >>>>>>> Yes, if kdump kernel use same DT as 1st kernel. >>>>>> >>>>>> Other than for testing purposes, I don't see why you'd pass the kdump >>>>>> kernel a DTB inconsistent with that the 1st kernel was passsed (other >>>>>> than some proerties under /chosen). >>>>>> >>>>>> We added /sys/firmware/fdt specifically to allow the kexec tools to get >>>>>> the exact DTB the first kernel used. There's no reason for tools to have >>>>>> to make something up. >>>>> >>>>> Agreed, but kexec-tools has an option to pass in any dtb files. Who knows >>>>> how one will use it unless dropping the option and use /sys/firmware/fdt >>>>> unconditionally. >>>> >>>> As a matter of fact, specifying proper command line parameters as well as >>>> dtb is partly users' responsibility for kdump to work correctly. >>>> (especially for BE kernel) >>>> >>>>> If we choose to implement kexec_file_load only in kernel, the interfaces >>>>> provided are kernel, initrd and cmdline. We can always use same dtb. >>>> >>>> I would say that we can always use the same dtb even with kexec_load >>> >from user's perspective. Right? >>> >>> No. >>> >>> This breaks using kexec for boot-loader purposes, and imposes a policy. >> >> What kind of policy? >> I said "can", but if we want to use other setting/configuration, we can >> still have a full control. > > Apologies, I misunderstood. > > In most cases, using the existing DTB (with minor modifications to > /chosen for bootargs and such) is fine. If the user just wants to boot > another Linux kernel, that's generally what they should do. > > I think we're agreed on that. Yes. > However, there are cases when the user may want to use a different DTB, > or use a different purgatory. So we cannot mandate that the existing DTB > is reused, nor that an in-kernel purgatory must be used, as that imposes > a policy. Agree! >>> For better or worse kexec_file_load has always imposed a constrained >>> Linux-only policy, so that's a different story. >>> >>>>>> There's a horrible edge case I've spotted if performing a chain of >>>>>> cross-endian kexecs: LE -> BE -> LE, as the BE kernel would have to >>>>>> respect the EFI memory map so as to avoid corrupting it for the >>>>>> subsequent LE kernel. Other than this I believe everything should just >>>>>> work. >>>>> >>>>> Firmware do not know kernel endianniess, kernel should respect firmware >>>>> maps and adapt to it, it sounds like a generic issue not specfic to kexec. >>>> >>>> On arm64, a kernel image header has a bit field to specify the image's endianness. >>>> Anyway, our current implementation replies on a user-supplied dtb to start BE kernel. >>> >>> The firmware should _never_ care about the kernel's endianness. The >>> bootlaoder or first kernel shouldn't care about the next kernel's >>> endianness apart from in exceptional circumstances. The DTB for a LE >>> kernel should look identical to that passed to a BE kernel. >> >> Please note that I didn't say anything different from your last two statements. >> The current arm64 kexec implementation doesn't do anything specific to BE, >> but as far as BE kernel doesn't support UEFI, users are responsible for >> providing a proper dtb. > > I'm just confused as to what you mean by a "proper dtb" in that case. > > If you just mean one with memory nodes hacked in, then that would > currently be a way to make that work, yes. One of useful cases that I have in my mind is kdump. We may want to use a small sub-set of dtb, especially devices, to make the reboot more reliable. Device drivers are likely to be vulnerable at crash. > It seems like the better option is to fix the BE kernel to support a > UEFI memory map, as that solves other issues. Why did Ard throw away his patch? >> >>> In my mind, the only valid reason to look at that bit is so that >>> bootloaders can provide a warning if the CPU does not implement that >>> endianness. >>> >>> The issue I mention above should be solved by changes to the BE kernel. >>> >>>>>>> Is it possible to modify uefi memmap for kdump case? >>>>>> >>>>>> Technically it would be possible, however I don't think it's necessary, >>>>>> and I think it would be disadvantageous to do so. >>>>>> >>>>>> Describing the range(s) the crash kernel can use in separate properties >>>>>> under /chosen has a number of advantages. >>>>> >>>>> Ok, I got the points. We have a is_kdump_kernel() by checking if there is >>>>> elfcorehdr_addr kernel cmdline. This is mainly for some drivers which >>>>> do not work well in kdump kernel some uncertain reasons. But ideally I >>>>> think kernel should handle things just like in 1st kernel and avoid to use >>>>> it. >>>> >>>> So I'm not still sure about what are advantages of a property under /chosen >>>> over "memmap=" kernel parameter. >>>> Both are simple and can have the same effect with minimizing changes to dtb. >>>> (But if, in the latter case, we have to provide *all* the memory-related information >>>> through "memmap=" parameters, it would be much complicated.) >>> >>> The reason I prefer a property over command line additions include: >> >> Take some examples: >> (a) a property under /chosen >> { >> chosen = { >> cmdline = "elfcorehdr=AA at BB maxcpus=1 ..."; >> } >> usable-memory = <XX YY>; >> memory { >> ... >> } >> } >> >> (b) a kernel command line parameter >> (I use the same name, "usable-memory", to show the similarity. may use another name though.) >> { >> chosen = { >> cmdline = "elfcorehdr=AA at BB maxcpus=1 usable-memory=YY at XX ..."; >> } >> memory { >> ... >> } >> } >> >>> * It keeps the command line simple (as you mention the opposite is >>> "complicated"). >> >> I think both are simple. >> >>> * It is logically separate from options the user may pass to the kernel >>> in that the restricted region(s) of memory avaialble are effectively >>> properties of the system (in that the crashed OS is part of the system >>> state). >> >> "elfcorehdr=" parameter already breaks your point. >> "elfcorehdr=" looks to be, what you say, a system property, and is actually >> added by kexec-tools on all architectures, and "usable-memory", whether it is >> a DT property or a kernel parameter, will also be added by kexec-tools. >> (Users don't have to care.) > > Just because architectures do one thing today does not mean that we have > to follow it. > > I don't think that breaks my point so much as shows that a different > approach is taken by others today. > > There's also no reason this cannot be a property under /chosen. No, but no strong reason to be so IMO. >>> * The semantics of the command line parsing can change subtly over time >>> (for example, see 51e158c12aca3c9a, which terminates command line >>> parseing at "--"). Maknig sure that a command line option will >>> actually be parsed by the next kernel is not trivial. >>> >>> Keeping this information isolated from the command line is more >>> robust. >> >> Even so, who wants to use kdump without testing? >> and this is not a kdump specific issue. >> >>> * Addition of a property is a self-contained operation, that doesn't >>> require any knowledge about the command line. >> >> I don't get your point here. >> For a kernel parameter, early_param() can encapsulate all the stuffs necessary. >> Once the kernel recognizes a usable memory region, limiting available >> memory should be done in the exact same way. > > I mean when modifying the command line. OK, I understand what you mean. > To place "elfcorehdr=" or "memmap="/"usable-memory=" into the command > line, one needs to know where it is valid to place it. Appending doesn't > always work, as per the example above with 51e158c12aca3c9a. So, are you now suggesting that we put both "elfcorehdr=" and "usable-memory=" under /chosen in dtb? That's fair enough. (as far as nobody cares about incompatibility with other archs.) -Takahiro AKASHI > For both of these my point was that generally there is some fragility in > this area. While it's easy to say that breaking this would be someone > else's fault, we can easily avoid the possibility of that happening, and > avoid a set of problems trying to maintain backwards compatibility if > there were a sensible change that happened to break things. > > Thanks, > Mark. >