Hi Igor, Thanks for your response firstly. :) > -----Original Message----- > From: Igor Mammedov [mailto:imammedo@xxxxxxxxxx] > Sent: Friday, June 01, 2018 6:23 PM > > On Fri, 1 Jun 2018 08:17:12 +0000 > xuyandong <xuyandong2@xxxxxxxxxx> wrote: > > > Hi there, > > > > I am doing some test on qemu vcpu hotplug and I run into some trouble. > > An emulation failure occurs and qemu prints the following msg: > > > > KVM internal error. Suberror: 1 > > emulation failure > > EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000600 > > ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000fff8 > > EIP=0000ff53 EFL=00010082 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0 > > ES =0000 00000000 0000ffff 00009300 > > CS =f000 000f0000 0000ffff 00009b00 > > SS =0000 00000000 0000ffff 00009300 > > DS =0000 00000000 0000ffff 00009300 > > FS =0000 00000000 0000ffff 00009300 > > GS =0000 00000000 0000ffff 00009300 > > LDT=0000 00000000 0000ffff 00008200 > > TR =0000 00000000 0000ffff 00008b00if > > GDT= 00000000 0000ffff > > IDT= 00000000 0000ffff > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 > DR3=0000000000000000 > > DR6=00000000ffff0ff0 DR7=0000000000000400 > > EFER=0000000000000000 > > Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3 <cf> 66 68 > 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 c6 > > > > I notice that guest is still running SeabBIOS in real mode when the vcpu has > just been pluged. > > This emulation failure can be steadly reproduced if I am doing vcpu hotplug > during VM launch process. > > After some digging, I find this KVM internal error shows up because KVM > cannot emulate some MMIO (gpa 0xfff53 ). > > > > So I am confused, > > (1) does qemu support vcpu hotplug even if guest is running seabios ? > There is no code that forbids it, and I would expect it not to trigger error > and be NOP. > > > (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm > confirm it as a mmio address incorrectly? > KVM trace and bios debug log might give more information to guess where to > look > or even better would be to debug Seabios and find out what exactly > goes wrong if you could do it. This issue can't be reproduced when we opened Seabios debug log or KVM trace. :( After a few days of debugging, we found that this problem occurs every time when the memory region is cleared (memory_size is 0) and the VFIO device is hot-plugged. The key function is kvm_set_user_memory_region(), I added some logs in it. gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc751e00000, mem.flags=0, memory_size=0x20000 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc751e00000, mem.flags=0, memory_size=0x0 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0x10000 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0x0 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0xbff40000 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0x0 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0xbff40000 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0x0 gonglei********: mem.slot: 3, mem.guest_phys_addr=0xc0000, mem.userspace_addr=0x7fc343ec0000, mem.flags=0, memory_size=0x9000 When the memory region is cleared, the KVM will tell the slot to be invalid (which it is set to KVM_MEMSLOT_INVALID). If SeaBIOS accesses this memory and cause page fault, it will find an invalid value according to gfn (by __gfn_to_pfn_memslot), and finally it will return an invalid value, and finally it will return a failure. The function calls process is as follows in KVM: kvm_mmu_page_fault tdp_page_fault try_async_pf __gfn_to_pfn_memslot __direct_map // return true; x86_emulate_instruction handle_emulation_failure The function calls process is as follows in Qemu: Breakpoint 1, kvm_set_user_memory_region (kml=0x564aa1e2c890, slot=0x564aa1e2d230) at /mnt/sdb/gonglei/qemu/kvm-all.c:261 (gdb) bt #0 kvm_set_user_memory_region (kml=0x564aa1e2c890, slot=0x564aa1e2d230) at /mnt/sdb/gonglei/qemu/kvm-all.c:261 #1 0x0000564a9e7e3096 in kvm_set_phys_mem (kml=0x564aa1e2c890, section=0x7febeb296500, add=false) at /mnt/sdb/gonglei/qemu/kvm-all.c:887 #2 0x0000564a9e7e34c7 in kvm_region_del (listener=0x564aa1e2c890, section=0x7febeb296500) at /mnt/sdb/gonglei/qemu/kvm-all.c:999 #3 0x0000564a9e7ea884 in address_space_update_topology_pass (as=0x564a9f2b2640 <address_space_memory>, old_view=0x7febdc3449c0, new_view=0x7febdc3443c0, adding=false) at /mnt/sdb/gonglei/qemu/memory.c:849 #4 0x0000564a9e7eac49 in address_space_update_topology (as=0x564a9f2b2640 <address_space_memory>) at /mnt/sdb/gonglei/qemu/memory.c:890 #5 0x0000564a9e7ead40 in memory_region_commit () at /mnt/sdb/gonglei/qemu/memory.c:933 #6 0x0000564a9e7eae26 in memory_region_transaction_commit () at /mnt/sdb/gonglei/qemu/memory.c:950 #7 0x0000564a9ea53e05 in i440fx_update_memory_mappings (d=0x564aa2089280) at hw/pci-host/piix.c:155 #8 0x0000564a9ea53ea3 in i440fx_write_config (dev=0x564aa2089280, address=88, val=286330880, len=4) at hw/pci-host/piix.c:168 #9 0x0000564a9ea63ad1 in pci_host_config_write_common (pci_dev=0x564aa2089280, addr=88, limit=256, val=286330880, len=4) at hw/pci/pci_host.c:66 #10 0x0000564a9ea63bf8 in pci_data_write (s=0x564aa2038d70, addr=2147483736, val=286330880, len=4) at hw/pci/pci_host.c:100 #11 0x0000564a9ea63d1a in pci_host_data_write (opaque=0x564aa2036ae0, addr=0, val=286330880, len=4) at hw/pci/pci_host.c:153 #12 0x0000564a9e7e9384 in memory_region_write_accessor (mr=0x564aa2036ee0, addr=0, value=0x7febeb2967c8, size=4, shift=0, mask=4294967295, attrs=...) at /mnt/sdb/gonglei/qemu/memory.c:527 #13 0x0000564a9e7e958f in access_with_adjusted_size (addr=0, value=0x7febeb2967c8, size=4, access_size_min=1, access_size_max=4, access= 0x564a9e7e92a3 <memory_region_write_accessor>, mr=0x564aa2036ee0, attrs=...) at /mnt/sdb/gonglei/qemu/memory.c:593 #14 0x0000564a9e7ebd00 in memory_region_dispatch_write (mr=0x564aa2036ee0, addr=0, data=286330880, size=4, attrs=...) at /mnt/sdb/gonglei/qemu/memory.c:1338 #15 0x0000564a9e798cda in address_space_write_continue (as=0x564a9f2b2520 <address_space_io>, addr=3324, attrs=..., buf=0x7febf8170000 "", len=4, addr1=0, l=4, mr=0x564aa2036ee0) at /mnt/sdb/gonglei/qemu/exec.c:3167 #16 0x0000564a9e798e99 in address_space_write (as=0x564a9f2b2520 <address_space_io>, addr=3324, attrs=..., buf=0x7febf8170000 "", len=4) at /mnt/sdb/gonglei/qemu/exec.c:3224 #17 0x0000564a9e7991c7 in address_space_rw (as=0x564a9f2b2520 <address_space_io>, addr=3324, attrs=..., buf=0x7febf8170000 "", len=4, is_write=true) at /mnt/sdb/gonglei/qemu/exec.c:3326 #18 0x0000564a9e7e55f6 in kvm_handle_io (port=3324, attrs=..., data=0x7febf8170000, direction=1, size=4, count=1) at /mnt/sdb/gonglei/qemu/kvm-all.c:1995 #19 0x0000564a9e7e5c35 in kvm_cpu_exec (cpu=0x564aa1e63580) at /mnt/sdb/gonglei/qemu/kvm-all.c:2189 #20 0x0000564a9e7ccc1a in qemu_kvm_cpu_thread_fn (arg=0x564aa1e63580) at /mnt/sdb/gonglei/qemu/cpus.c:1078 #21 0x0000564a9ec4afcb in qemu_thread_start (args=0x564aa1e8b490) at util/qemu-thread-posix.c:496 #22 0x00007febf4efedc5 in start_thread () from /lib64/libpthread.so.0 #23 0x00007febf1ce079d in clone () from /lib64/libc.so.6 So, My questions are: 1) Why don't we hold kvm->slots_lock during page fault processing? 2) How do we assure that vcpus will not access the corresponding region when deleting an memory slot? Thanks, -Gonglei