On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb