On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb