On 30.03.20 19:17, James Morse wrote: > Hi David, > > On 3/30/20 2:13 PM, David Hildenbrand wrote: >>> Adding a sentence about the way kexec load works may help, the first paragraph >>> would read: >>> >>> | Kexec allows user-space to specify the address that the kexec image should be >>> | loaded to. Because this memory may be in use, an image loaded for kexec is not >>> | stored in place, instead its segments are scattered through memory, and are >>> | re-assembled when needed. In the meantime, the target memory may have been >>> | removed. >>> >>> Do you think thats clearer? >> >> Yes, very much. Maybe add, that the target is described by user space >> during kexec_load() and that user space - right now - parses /proc/iomem >> to find applicable system memory. > > (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that > expanded commit message, once we've agreed this is the right thing to do! Right, I can see kexec-tools parsing /sys/firmware/memmap first. Unfortunately, all hotplugged memory (via add_memory()) is indicated there as System RAM ... including memory added by virtio-mem. I think we should adapt the type there as well. (in your patch #2) firmware_map_add_hotplug(start, start + size, "System RAM"); > > >>>> I wonder if we should instead make the "kexec -e" fail. It tries to >>>> touch random system memory. >>> >>> Heh, isn't touching random system memory what kexec does?! >> >> Having a racy user interface that can trigger kernel crashes feels very >> wrong. We should limit the impact. > > >>> Its all described to user-space as 'System RAM'. Teaching it to probe >>> /sys/devices/memory/... would require a user-space change. >> >> I think we should really rename hotplugged memory on all architectures. >> >> Especially also relevant for virtio-mem/hyper-v balloon, where some >> pieces of (hotplugged )memory blocks are partially unavailable and >> should not be touched - accessing them results in unpredictable behavior >> (e.g., crashes or discarded writes). > > I'll need to look into these. I'd assume for KVM that virtio-mem can be brought > back when its accessed ... its just going to be slow. Touching unplugged virtio-mem memory can result in unpredictable behavior. Touching (some) unplugged Hyper-V memory will be handled similarly AFAIK. [...] >> 1. It's racy. If memory is getting offlined/unplugged just while user >> space is about to trigger the kexec_load(), you end up with the very >> same triple-fault. > > load? How is this different to user-space providing a bogus address? I guess it's not different. It's just racy because user space with good intend could crash the system :) > > Sure, user-space may take a nap between parsing /proc/iomem and calling > kexec_load(), but the kernel should reject these as they would never work. > > (I can't see where sanity_check_segment_list() considers the platform's memory. > If it doesn't, we should fix it) Right, that's what I meant. I was not able to find any sanity checks. Maybe they are in place but I was not able to spot them. > > Once the image is loaded, and clashes with a request to remove the memory there > are two choices: secretly unload the image, or prevent the memory being taken > offline. Exactly. Or make "kexec -e" fail. > > >> 2. It's semantically wrong. kexec does not need online memory ("managed >> by the buddy"), but still you disallow offlining memory. > > It does need the memory if you want 'kexec -e' to succeed. > If there were any sanity tests, they should have happened at load time. Offlining != removing. That's the point I was trying to make. (and we don't want to block removing of memory in the kernel any other way) > > The memory is effectively in use by the loaded kexec image. User-space told the > kernel to use this memory, you should not be able to then remove it, without > unloading the kexec image first. It's not in use before you do the "kexec -e" IMHO. > Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up > like this? No, not at all. I think this should be fixed if this is possible. > >> I would really much rather want to see user-space choosing boot memory >> (e.g., renaming hotplugged memory on all architectures), and checking >> during "kexec -e" if the selected memory is actually "there", before >> trying to write to it. > > How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass > something between 'load' and 'exec'. How do you keep existing user-space working > as much as possible? If we use new types (e.g., "System RAM (hotplugged)"), looks like most of kexec will continue working (memory will be treated like RANGE_RESERVED or ignored). I guess we would still have to teach kexec-tools the new types, primarily to keep the crash memory ranges from getting detected properly. (no idea how they are used, will have to take a closer look) > > What do you do if the memory isn't there? User-space just called reboot(), it > would be better to avoid getting into the situation where we have to fail that call. In kernel_kexec() we already fail if there is no kernel image loaded, so we can similarly simply fail if the kernel image cannot be moved to the target memory IMHO. -- Thanks, David / dhildenb