On Tue, Apr 24, 2018 at 05:08:57PM +0100, James Morse wrote: > Hi Akashi, > > On 16/04/18 11:08, AKASHI Takahiro wrote: > > On Thu, Apr 12, 2018 at 05:01:52PM +0100, James Morse wrote: > >> On 05/04/18 03:42, AKASHI Takahiro wrote: > >>> On Mon, Apr 02, 2018 at 10:53:32AM +0900, AKASHI Takahiro wrote: > >>>> Basically, changes that I made on /proc/iomem in my new format D were: > >>>> 1. to move NOMAP region entries, formerly named "reserved" and now named > >>>> "reserved (no map)", under System RAM > >>>> 2. to add new entries for firmware-reserved regions as "reserved" also > >>>> under System RAM > >>>> > >>>> On the other hand, current kexec-tools, in particular kexec command, > >>>> only scan top-level "System RAM" entries as well as "reserved" entries. > >> > >> as well as? > > > > I had few words here. > > The current kexec-tools assumes that "reserved" entries appear only > > at the top level. So, > > > >> Does this mean kexec will pick up the reserved region if its written as: > >> | 00001000-0009d7ff : System RAM > >> | 00001000-00001fff : reserved > > > > if this is the case, the range "0x1000-0x1fff" is added to an internal > > list of memory ranges > > I found this in get_memory_ranges_iomem_cb()... > > > > but will later be *ignored* by locate_hole() function > > due to its memory type. > > Ugh. Great. > > > > That is, the range can potentially be overwritten by loaded kernel/initrd. > > So two kernel bugs, one user-space bug, all conspiring. > > > >>>> either because > >>>> a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region > >>>> which are not suitable for usable memory, or > >>>> b. new kernel (or initrd/dtb) may have been allocated on a reserved region > >>>> whose contents can be overwritten. > >>>> > >>>> While we see (b) even today, (a) is a backward compatibility issue. > >> > >> (a) doesn't happen because request_standard_resources() checks > >> memblock_is_nomap(), and reports those regions as 'reserved'. > > > > I might have confused you. The assumption here was that we adopt format (D), > > where all NOMAP regions are sub nodes of "System RAM", but still use > > the current kexec-tools. > > As I said above, this will end up an un-expected behavior. > > I'd like to fix this without having to fix user-space at the same time. It looks > like no-one else has second level reserved regions, This was my assumption when I sent out a patch to kexec-tools. > so we can't blame > kexec-tools for looking straight at them, then ignoring them. > > > >> We can't expect user-space to upgrade to fix this issue. > > > > I'm not sure what you mean here; we can't fix the issue anyway > > without changing user-space/kexec-tools as kexec_load system call totally > > relies on parameters passed by kexec-tools. > > (The only difference is whether we need additional kernel changes or not.) > > It looks like this was always broken because the efi memory map isn't listed as > 'reserved' in /proc/iomem. The fallout for the new stuff is secondary. > > > >>> # I don't know yet whether people are happy with this fix, and also have > >>> kernel patches for my other approaches. They are neither not much > >>> complicated. > >> > >> I don't think we should fix this in userspace, exporting all the > >> memblock_reserved() regions as 'reserved' in /proc/iomem looks like the right > >> thing to do. > > > > Again, if you modify /proc/iomem, you have to update kexec-tools, too. > > If we squash the memblock_reserved() stuff down so it appears as a top level > 'reserved' region too, I don't think we do. If I correctly understand, you're talking about my format (E). As I said, it will fix the issue without modifying user-space, but || This does not only look quite noisy but also ignores the fact that || reserved regions are part of System RAM (or memblock.memory). > This prevents the efi-memory-map > being overwritten on kernels since kexec was merged. > > Its horribly fiddly to do this. The kernel code/data are special reserved > regions that we already describe as a subset of system-ram, even though they are > both also fragments of a bigger memblock_reserved() block. Actually, we don't have to avoid kernel code/data regions as copying loaded data to the final destinations will be done at the very end of kexec. > While we can walk memblock for regions that aren't reserved, allocating memory > in the loop changes what is reserved. That one O(N) walk ends up being four... At most O(n^2)? Thanks, -Takhairo AKASHI > I'm almost done tearing my hair out, I should have a working patch soon... > > > >> wasn't there going to be another version, with the core EFI > >> stuff split out? > > > > ? I don't remember well ... > > https://lkml.org/lkml/2018/2/1/496 > > > Thanks, > > James > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec