Re: [RFC PATCH 0/3] fix migrate failed when vm is in booting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017/9/6 21:05, wanghaibin wrote:

> We have a test scenario: vmlife and migrate fixed test.
> 
> Here is a problem; VM migration failed caused the qemu core which gdb trace:


I tested migration with these patches in these days, and here is another scene can expose another problem.

scene: migrate at the moment which vm boot in UEFI (EDK2, before the guest kernel boot) while vm reboot.
core file(at destination) gdb trace:
(gdb) bt
#0  0x0000ffff978dae84 in raise () from /usr/lib64/libc.so.6
#1  0x0000ffff978dcb80 in abort () from /usr/lib64/libc.so.6
#2  0x0000000000465468 in kvm_device_access ()
#3  0x00000000004a2d68 in kvm_arm_its_post_load ()
#4  0x0000000000629038 in gicv3_its_post_load ()
#5  0x00000000006c042c in vmstate_load_state ()
#6  0x000000000047acf8 in qemu_loadvm_section_start_full ()
#7  0x000000000047b234 in qemu_loadvm_state_main ()
#8  0x000000000047cdb8 in qemu_loadvm_state ()
#9  0x00000000006bf3cc in process_incoming_migration_co ()
#10 0x00000000007b40f0 in coroutine_trampoline ()
#11 0x0000ffff978ebfa0 in ?? () from /usr/lib64/libc.so.6

The reason of the bug is vITS device doesn't reset it's registers value correctly, in qemu vits device reset logic, the iidr is set to 0,
and the kvm_arm_its_post_load check the iidr is 0 and return (without put any registers).

Here maybe a problem when migrate at that moment (vm boot in UEFI), the restore interface can check the corresponding BASERn registers
valid successfully, and flush info from guest memory to kvm will failed in future (for example, the BASER[CT].valid=1, but the CTE maybe invalid).

So, we should fix the vITS device reset interface, and there may be a rule that needs clarification:
In arm-vgic-its.txt, it has been described about ITS restore sequence:
d) restore the ITS in the following order:
   1. Restore GITS_CBASER
   2. Restore all other GITS_ registers, except GITS_CTLR!
   3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES)
   4. Restore GITS_CTLR

But this sequence is inappropriate for reset interface based on our currently implementation , for example, reset interface write creadr/baser with
the initial value must rely on its disable firstly.

So, minimal change (fix in qemu) maybe like:

diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
index 39903d5..9602dd3 100644
--- a/hw/intc/arm_gicv3_its_kvm.c
+++ b/hw/intc/arm_gicv3_its_kvm.c
@@ -160,7 +160,8 @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
     int i;

     if (!s->iidr) {
-        return;
+        kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+                          GITS_CTLR, &s->ctlr, true, &error_abort);
     }

     kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
@@ -190,8 +191,10 @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
                       KVM_DEV_ARM_ITS_RESTORE_TABLES, NULL, true,
                       &error_abort);

-    kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
-                      GITS_CTLR, &s->ctlr, true, &error_abort);
+    if (s->iidr) {
+        kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
+                          GITS_CTLR, &s->ctlr, true, &error_abort);
+    }
 }

Any suggestion ?

Thanks.

> 
> #0  0x0000ffffb023fe84 in raise () from /usr/lib64/libc.so.6
> #1  0x0000ffffb0241b80 in abort () from /usr/lib64/libc.so.6
> #2  0x000000000046b408 in kvm_device_access (fd=<optimized out>, group=4, attr=1, val=<optimized out>, write=<optimized out>)
>     at /usr/src/debug/qemu-2.6.0/kvm-all.c:2064
> #3  0x000000000059ade8 in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at vl.c:1728
> #4  0x000000000045736c in do_vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:744
> #5  0x00000000004573bc in vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1434
> #6  0x0000000000457428 in vm_stop_force_state (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1442
> #7  0x00000000006d68d0 in migration_completion (s=s@entry=0xb92d60 <current_migration.37525>, current_active_state=4, 
>     current_active_state@entry=65535, old_vm_running=0xffffb03c2000, old_vm_running@entry=0xfffe934fc53f, 
>     start_time=0xfffe934fcce0, start_time@entry=0xfffe934fc540) at migration/migration.c:1798
> #8  0x00000000006d7480 in migration_thread (opaque=0xb92d60 <current_migration.37525>) at migration/migration.c:1995
> #9  0x0000ffffb0398dc4 in start_thread () from /usr/lib64/libpthread.so.0
> #10 0x0000ffffb02ef020 in thread_start () from /usr/lib64/libc.so.6
> 
> qemu failed log :
> 2017-08-28T12:34:29.654396Z qemu-kvm: KVM_SET_DEVICE_ATTR failed: Invalid argument
> 
> I try to debug it, and I found the migrate at the moment that the vm is still booting.
> 
> So just change the guest like :
> diff --git a/drivers/pci/host/pci-host-common.c b/drivers/pci/host/pci-host-common.c
> index e9a53ba..d73fc03 100644
> --- a/drivers/pci/host/pci-host-common.c
> +++ b/drivers/pci/host/pci-host-common.c
> @@ -23,6 +23,7 @@
>  #include <linux/of_pci.h>
>  #include <linux/pci-ecam.h>
>  #include <linux/platform_device.h>
> +#include <linux/delay.h>
>  
>  static int gen_pci_parse_request_of_pci_ranges(struct device *dev,
>                        struct list_head *resources, struct resource **bus_range)
> @@ -119,6 +120,7 @@ int pci_host_common_probe(struct platform_device *pdev,
>         struct pci_bus *bus, *child;
>         struct pci_config_window *cfg;
>         struct list_head resources;
> +       int i;
>  
>         type = of_get_property(np, "device_type", NULL);
>         if (!type || strcmp(type, "pci")) {
> @@ -164,6 +166,8 @@ int pci_host_common_probe(struct platform_device *pdev,
>                         pcie_bus_configure_settings(child);
>         }
>  
> +       for (i=0;i<20000; i++)
> +               udelay(1000);
>         pci_bus_add_devices(bus);
>         return 0;
>  }
> 
> And migrate at this delay time, It must be failed.
> 
> This patchset try to fix this problem.
> 
> BTW: This patchset just a demo, haven't do more test, and the implement maybe a little evil.
> 
> Thanks
> 
> 
> 
> wanghaibin (3):
>   kvm: arm/arm64: vgic-vits: separate vgic_its_free_list() function
>   kvm: arm/arm64: vgic-vits: free its resource when vm reboot/reset
>   kvm: arm/arm64: vgic-its: fix return value for restore
> 
>  virt/kvm/arm/arm.c           |  5 ++++-
>  virt/kvm/arm/vgic/vgic-its.c | 26 +++++++++++++++++++-------
>  virt/kvm/arm/vgic/vgic.h     |  1 +
>  3 files changed, 24 insertions(+), 8 deletions(-)
> 



_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm




[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux