>> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@xxxxxxxxxx/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb