James, My apologies for slow response. I had a long weekend. On Tue, Mar 27, 2018 at 02:32:49PM +0100, James Morse wrote: > Hi Akashi, > > On 27/03/18 11:16, AKASHI Takahiro wrote: > > On Tue, Mar 20, 2018 at 01:18:34AM +0530, Bhupesh Sharma wrote: > >> On 03/14/2018 01:59 PM, AKASHI Takahiro wrote: > >>> Currently, there is a inconsistent view between (A) and the mainline's: > >>> see (A-1) and (B-1). If this is really a matter, I can fix it. > >>> Kexec-tools can be easily modified to accept both formats, though. > > Ooer, what needs changing in kexec-tools? What happens if someone doesn't update > userspace at the same time? Basically, changes that I made on /proc/iomem in my new format D were: 1. to move NOMAP region entries, formerly named "reserved" and now named "reserved (no map)", under System RAM 2. to add new entries for firmware-reserved regions as "reserved" also under System RAM On the other hand, current kexec-tools, in particular kexec command, only scan top-level "System RAM" entries as well as "reserved" entries. So if someone doesn't update kexec-tools, secondary kernel may potentially crash during boot time either because a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region which are not suitable for usable memory, or b. new kernel (or initrd/dtb) may have been allocated on a reserved region whose contents can be overwritten. While we see (b) even today, (a) is a backward compatibility issue. Note: we have a different story for kdump (alignment error), and I will take a different approach to fixing kdump case. > Is there a format which doesn't require a user-space change, (and shouldn't we > pick that one?) The only solution that I can imagine for now to prevent (a) and (b) at the same time without any user-space change is 2+. to add new entries for firmware-reserved regions as "reserved", in addition to the current NOMAP regions, at top level (format E) 40000000-5858ffff : System RAM 40080000-40f1ffff : Kernel code 41040000-411e9fff : Kernel data 54400000-583fffff : Crash kernel 58590000-585effff : reserved 484f0000-586fffff : System RAM 58700000-5871ffff : reserved 58720000-58b5ffff : reserved (no map) 58b60000-58b0ffff : System RAM 58b61000-58b61fff : reserved 58620000-59a7b117 : System RAM 59a7b118-59a7b667 : reserved 59a7b668-5be3ffff : System RAM 5be40000-5becffff : reserved (no map) 5bed0000-5bedffff : System RAM 5bee0000-5bffffff : reserved (no map) 5ec00000-5edfffff : reserved 5ee00000-5fffffff ; System RAM 8000000000-ffffffffff : PCI Bus 0000:00 This does not only look quite noisy but also ignores the fact that reserved regions are part of System RAM (or memblock.memory). Or to maximize the compatibility, we may adopt format B: (format B) 40000000-5871ffff : System RAM 40080000-40f1ffff : Kernel code 41040000-411e9fff : Kernel data 54400000-583fffff : Crash kernel 58590000-585effff : reserved 58700000-5871ffff : reserved 58720000-58b5ffff : reserved (no map) 58b60000-5be3ffff : System RAM 58b61000-58b61fff : reserved 59a7b118-59a7b667 : reserved 5be40000-5becffff : reserved (no map) 5bed0000-5bedffff : System RAM 5bee0000-5bffffff : reserved (no map) 5c000000-5fffffff : System RAM 5ec00000-5edfffff : reserved 8000000000-ffffffffff : PCI Bus 0000:00 but, in this case, we need some change on kexec-tools to fix (b). > >>> 2. How should we determine which regions be exported in /proc/iomem? > >>> > >>> a. Trust all the memblock_reserve'd regions as my previous patch [3] does. > >>> > >>> As I said, it's a kind of "overkill." Some of regions, say fdt, are > >>> not required to be preserved across kexec. > >> > >> > >> I think we should preserve all the memblock_reserve'd regions. So +1 on this > >> approach from my side. I believe it might help avoid issues we have seen in > >> the past with 'kexec-tools' _incorrectly_ determining which regions to pick > >> from the '/proc/iomem'. > > > > As I said in my reply to Ard's comment, I now know *overkill* is not a big > > issue and I will go for this approach. > > /sys/kernel/debug/memblock/reserved has all kinds of weird stuff in it, > including some smaller-than-a-page reservations that appear to come from the > percpu allocator. > > I agree it will make the implementation simpler, and reserving 'too much' isn't > an issue. Are you suggesting that we should use /sys/kernel/debug/memblock/reserved without modifying current /proc/iomem? (Note that, even in this approach, we need an user-space change.) Hmm, overall, this approach will be preferable to format B/E. Thanks, -Takahiro AKASHI > > Thanks, > > James _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec