On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@xxxxxxxxxx/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently.