Re: [RFC PATCH 0/3] fix migrate failed when vm is in booting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017/9/20 15:16, Auger Eric wrote:

> Hi Wanghaibin,
> 
> On 20/09/2017 03:57, wanghaibin wrote:
>> On 2017/9/6 21:05, wanghaibin wrote:
>>
>>> We have a test scenario: vmlife and migrate fixed test.
>>>
>>> Here is a problem; VM migration failed caused the qemu core which gdb trace:
>>
>>
>> I tested migration with these patches in these days, and here is another scene can expose another problem.
>>
>> scene: migrate at the moment which vm boot in UEFI (EDK2, before the guest kernel boot) while vm reboot.
>> core file(at destination) gdb trace:
>> (gdb) bt
>> #0  0x0000ffff978dae84 in raise () from /usr/lib64/libc.so.6
>> #1  0x0000ffff978dcb80 in abort () from /usr/lib64/libc.so.6
>> #2  0x0000000000465468 in kvm_device_access ()
>> #3  0x00000000004a2d68 in kvm_arm_its_post_load ()
>> #4  0x0000000000629038 in gicv3_its_post_load ()
>> #5  0x00000000006c042c in vmstate_load_state ()
>> #6  0x000000000047acf8 in qemu_loadvm_section_start_full ()
>> #7  0x000000000047b234 in qemu_loadvm_state_main ()
>> #8  0x000000000047cdb8 in qemu_loadvm_state ()
>> #9  0x00000000006bf3cc in process_incoming_migration_co ()
>> #10 0x00000000007b40f0 in coroutine_trampoline ()
>> #11 0x0000ffff978ebfa0 in ?? () from /usr/lib64/libc.so.6
>>
>> The reason of the bug is vITS device doesn't reset it's registers value correctly, in qemu vits device reset logic, the iidr is set to 0,
>> and the kvm_arm_its_post_load check the iidr is 0 and return (without put any registers).
> 
> That's correct this is what currently prevents from writing the
> registers on reset. My initial goal was that this code would be executed
> only on restore() and not on reset(), hence this check.
> 
>>
>> Here maybe a problem when migrate at that moment (vm boot in UEFI), the restore interface can check the corresponding BASERn registers
>> valid successfully, and flush info from guest memory to kvm will failed in future (for example, the BASER[CT].valid=1, but the CTE maybe invalid).
>>
>> So, we should fix the vITS device reset interface, and there may be a rule that needs clarification:
>> In arm-vgic-its.txt, it has been described about ITS restore sequence:
>> d) restore the ITS in the following order:
>>    1. Restore GITS_CBASER
>>    2. Restore all other GITS_ registers, except GITS_CTLR!
>>    3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES)
>>    4. Restore GITS_CTLR
>>
>> But this sequence is inappropriate for reset interface based on our currently implementation , for example, reset interface write creadr/baser with
>> the initial value must rely on its disable firstly.
> 
> After the discussion with Christoffer, I think the preferred solution is
> to introduce a new KVM ITS device reset IOCTL and not write individual
> registers as this sequence would do. Anyway as you point ed out, this
> sequence does not work as is. Typically writing CREADR/CWRITER with 0
> fails at the moment because CBASER == 0.
> 
> So may I suggest I follow up with an implementation of the reset IOCTL +
> some cleanup patchs? Please an you respin the restore fix according to
> what we discussed?



Sure, Thanks.

> 
> Thanks
> 
> Eric
>>
>> So, minimal change (fix in qemu) maybe like:
>>
>> diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c
>> index 39903d5..9602dd3 100644
>> --- a/hw/intc/arm_gicv3_its_kvm.c
>> +++ b/hw/intc/arm_gicv3_its_kvm.c
>> @@ -160,7 +160,8 @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
>>      int i;
>>
>>      if (!s->iidr) {
>> -        return;
>> +        kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
>> +                          GITS_CTLR, &s->ctlr, true, &error_abort);
>>      }
>>
>>      kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
>> @@ -190,8 +191,10 @@ static void kvm_arm_its_post_load(GICv3ITSState *s)
>>                        KVM_DEV_ARM_ITS_RESTORE_TABLES, NULL, true,
>>                        &error_abort);
>>
>> -    kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
>> -                      GITS_CTLR, &s->ctlr, true, &error_abort);
>> +    if (s->iidr) {
>> +        kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS,
>> +                          GITS_CTLR, &s->ctlr, true, &error_abort);
>> +    }
>>  }
>>
>> Any suggestion ?
>>
>> Thanks.
>>
>>>
>>> #0  0x0000ffffb023fe84 in raise () from /usr/lib64/libc.so.6
>>> #1  0x0000ffffb0241b80 in abort () from /usr/lib64/libc.so.6
>>> #2  0x000000000046b408 in kvm_device_access (fd=<optimized out>, group=4, attr=1, val=<optimized out>, write=<optimized out>)
>>>     at /usr/src/debug/qemu-2.6.0/kvm-all.c:2064
>>> #3  0x000000000059ade8 in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at vl.c:1728
>>> #4  0x000000000045736c in do_vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:744
>>> #5  0x00000000004573bc in vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1434
>>> #6  0x0000000000457428 in vm_stop_force_state (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1442
>>> #7  0x00000000006d68d0 in migration_completion (s=s@entry=0xb92d60 <current_migration.37525>, current_active_state=4, 
>>>     current_active_state@entry=65535, old_vm_running=0xffffb03c2000, old_vm_running@entry=0xfffe934fc53f, 
>>>     start_time=0xfffe934fcce0, start_time@entry=0xfffe934fc540) at migration/migration.c:1798
>>> #8  0x00000000006d7480 in migration_thread (opaque=0xb92d60 <current_migration.37525>) at migration/migration.c:1995
>>> #9  0x0000ffffb0398dc4 in start_thread () from /usr/lib64/libpthread.so.0
>>> #10 0x0000ffffb02ef020 in thread_start () from /usr/lib64/libc.so.6
>>>
>>> qemu failed log :
>>> 2017-08-28T12:34:29.654396Z qemu-kvm: KVM_SET_DEVICE_ATTR failed: Invalid argument
>>>
>>> I try to debug it, and I found the migrate at the moment that the vm is still booting.
>>>
>>> So just change the guest like :
>>> diff --git a/drivers/pci/host/pci-host-common.c b/drivers/pci/host/pci-host-common.c
>>> index e9a53ba..d73fc03 100644
>>> --- a/drivers/pci/host/pci-host-common.c
>>> +++ b/drivers/pci/host/pci-host-common.c
>>> @@ -23,6 +23,7 @@
>>>  #include <linux/of_pci.h>
>>>  #include <linux/pci-ecam.h>
>>>  #include <linux/platform_device.h>
>>> +#include <linux/delay.h>
>>>  
>>>  static int gen_pci_parse_request_of_pci_ranges(struct device *dev,
>>>                        struct list_head *resources, struct resource **bus_range)
>>> @@ -119,6 +120,7 @@ int pci_host_common_probe(struct platform_device *pdev,
>>>         struct pci_bus *bus, *child;
>>>         struct pci_config_window *cfg;
>>>         struct list_head resources;
>>> +       int i;
>>>  
>>>         type = of_get_property(np, "device_type", NULL);
>>>         if (!type || strcmp(type, "pci")) {
>>> @@ -164,6 +166,8 @@ int pci_host_common_probe(struct platform_device *pdev,
>>>                         pcie_bus_configure_settings(child);
>>>         }
>>>  
>>> +       for (i=0;i<20000; i++)
>>> +               udelay(1000);
>>>         pci_bus_add_devices(bus);
>>>         return 0;
>>>  }
>>>
>>> And migrate at this delay time, It must be failed.
>>>
>>> This patchset try to fix this problem.
>>>
>>> BTW: This patchset just a demo, haven't do more test, and the implement maybe a little evil.
>>>
>>> Thanks
>>>
>>>
>>>
>>> wanghaibin (3):
>>>   kvm: arm/arm64: vgic-vits: separate vgic_its_free_list() function
>>>   kvm: arm/arm64: vgic-vits: free its resource when vm reboot/reset
>>>   kvm: arm/arm64: vgic-its: fix return value for restore
>>>
>>>  virt/kvm/arm/arm.c           |  5 ++++-
>>>  virt/kvm/arm/vgic/vgic-its.c | 26 +++++++++++++++++++-------
>>>  virt/kvm/arm/vgic/vgic.h     |  1 +
>>>  3 files changed, 24 insertions(+), 8 deletions(-)
>>>
>>
>>
>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@xxxxxxxxxxxxxxxxxxxxx
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 
> .
> 



_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm




[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux