David Hildenbrand <david@xxxxxxxxxx> writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric