On 2017/9/20 15:16, Auger Eric wrote: > Hi Wanghaibin, > > On 20/09/2017 03:57, wanghaibin wrote: >> On 2017/9/6 21:05, wanghaibin wrote: >> >>> We have a test scenario: vmlife and migrate fixed test. >>> >>> Here is a problem; VM migration failed caused the qemu core which gdb trace: >> >> >> I tested migration with these patches in these days, and here is another scene can expose another problem. >> >> scene: migrate at the moment which vm boot in UEFI (EDK2, before the guest kernel boot) while vm reboot. >> core file(at destination) gdb trace: >> (gdb) bt >> #0 0x0000ffff978dae84 in raise () from /usr/lib64/libc.so.6 >> #1 0x0000ffff978dcb80 in abort () from /usr/lib64/libc.so.6 >> #2 0x0000000000465468 in kvm_device_access () >> #3 0x00000000004a2d68 in kvm_arm_its_post_load () >> #4 0x0000000000629038 in gicv3_its_post_load () >> #5 0x00000000006c042c in vmstate_load_state () >> #6 0x000000000047acf8 in qemu_loadvm_section_start_full () >> #7 0x000000000047b234 in qemu_loadvm_state_main () >> #8 0x000000000047cdb8 in qemu_loadvm_state () >> #9 0x00000000006bf3cc in process_incoming_migration_co () >> #10 0x00000000007b40f0 in coroutine_trampoline () >> #11 0x0000ffff978ebfa0 in ?? () from /usr/lib64/libc.so.6 >> >> The reason of the bug is vITS device doesn't reset it's registers value correctly, in qemu vits device reset logic, the iidr is set to 0, >> and the kvm_arm_its_post_load check the iidr is 0 and return (without put any registers). > > That's correct this is what currently prevents from writing the > registers on reset. My initial goal was that this code would be executed > only on restore() and not on reset(), hence this check. > >> >> Here maybe a problem when migrate at that moment (vm boot in UEFI), the restore interface can check the corresponding BASERn registers >> valid successfully, and flush info from guest memory to kvm will failed in future (for example, the BASER[CT].valid=1, but the CTE maybe invalid). >> >> So, we should fix the vITS device reset interface, and there may be a rule that needs clarification: >> In arm-vgic-its.txt, it has been described about ITS restore sequence: >> d) restore the ITS in the following order: >> 1. Restore GITS_CBASER >> 2. Restore all other GITS_ registers, except GITS_CTLR! >> 3. Load the ITS table data (KVM_DEV_ARM_ITS_RESTORE_TABLES) >> 4. Restore GITS_CTLR >> >> But this sequence is inappropriate for reset interface based on our currently implementation , for example, reset interface write creadr/baser with >> the initial value must rely on its disable firstly. > > After the discussion with Christoffer, I think the preferred solution is > to introduce a new KVM ITS device reset IOCTL and not write individual > registers as this sequence would do. Anyway as you point ed out, this > sequence does not work as is. Typically writing CREADR/CWRITER with 0 > fails at the moment because CBASER == 0. > > So may I suggest I follow up with an implementation of the reset IOCTL + > some cleanup patchs? Please an you respin the restore fix according to > what we discussed? Sure, Thanks. > > Thanks > > Eric >> >> So, minimal change (fix in qemu) maybe like: >> >> diff --git a/hw/intc/arm_gicv3_its_kvm.c b/hw/intc/arm_gicv3_its_kvm.c >> index 39903d5..9602dd3 100644 >> --- a/hw/intc/arm_gicv3_its_kvm.c >> +++ b/hw/intc/arm_gicv3_its_kvm.c >> @@ -160,7 +160,8 @@ static void kvm_arm_its_post_load(GICv3ITSState *s) >> int i; >> >> if (!s->iidr) { >> - return; >> + kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS, >> + GITS_CTLR, &s->ctlr, true, &error_abort); >> } >> >> kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS, >> @@ -190,8 +191,10 @@ static void kvm_arm_its_post_load(GICv3ITSState *s) >> KVM_DEV_ARM_ITS_RESTORE_TABLES, NULL, true, >> &error_abort); >> >> - kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS, >> - GITS_CTLR, &s->ctlr, true, &error_abort); >> + if (s->iidr) { >> + kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_REGS, >> + GITS_CTLR, &s->ctlr, true, &error_abort); >> + } >> } >> >> Any suggestion ? >> >> Thanks. >> >>> >>> #0 0x0000ffffb023fe84 in raise () from /usr/lib64/libc.so.6 >>> #1 0x0000ffffb0241b80 in abort () from /usr/lib64/libc.so.6 >>> #2 0x000000000046b408 in kvm_device_access (fd=<optimized out>, group=4, attr=1, val=<optimized out>, write=<optimized out>) >>> at /usr/src/debug/qemu-2.6.0/kvm-all.c:2064 >>> #3 0x000000000059ade8 in vm_state_notify (running=running@entry=0, state=state@entry=RUN_STATE_FINISH_MIGRATE) at vl.c:1728 >>> #4 0x000000000045736c in do_vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:744 >>> #5 0x00000000004573bc in vm_stop (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1434 >>> #6 0x0000000000457428 in vm_stop_force_state (state=state@entry=RUN_STATE_FINISH_MIGRATE) at /usr/src/debug/qemu-2.6.0/cpus.c:1442 >>> #7 0x00000000006d68d0 in migration_completion (s=s@entry=0xb92d60 <current_migration.37525>, current_active_state=4, >>> current_active_state@entry=65535, old_vm_running=0xffffb03c2000, old_vm_running@entry=0xfffe934fc53f, >>> start_time=0xfffe934fcce0, start_time@entry=0xfffe934fc540) at migration/migration.c:1798 >>> #8 0x00000000006d7480 in migration_thread (opaque=0xb92d60 <current_migration.37525>) at migration/migration.c:1995 >>> #9 0x0000ffffb0398dc4 in start_thread () from /usr/lib64/libpthread.so.0 >>> #10 0x0000ffffb02ef020 in thread_start () from /usr/lib64/libc.so.6 >>> >>> qemu failed log : >>> 2017-08-28T12:34:29.654396Z qemu-kvm: KVM_SET_DEVICE_ATTR failed: Invalid argument >>> >>> I try to debug it, and I found the migrate at the moment that the vm is still booting. >>> >>> So just change the guest like : >>> diff --git a/drivers/pci/host/pci-host-common.c b/drivers/pci/host/pci-host-common.c >>> index e9a53ba..d73fc03 100644 >>> --- a/drivers/pci/host/pci-host-common.c >>> +++ b/drivers/pci/host/pci-host-common.c >>> @@ -23,6 +23,7 @@ >>> #include <linux/of_pci.h> >>> #include <linux/pci-ecam.h> >>> #include <linux/platform_device.h> >>> +#include <linux/delay.h> >>> >>> static int gen_pci_parse_request_of_pci_ranges(struct device *dev, >>> struct list_head *resources, struct resource **bus_range) >>> @@ -119,6 +120,7 @@ int pci_host_common_probe(struct platform_device *pdev, >>> struct pci_bus *bus, *child; >>> struct pci_config_window *cfg; >>> struct list_head resources; >>> + int i; >>> >>> type = of_get_property(np, "device_type", NULL); >>> if (!type || strcmp(type, "pci")) { >>> @@ -164,6 +166,8 @@ int pci_host_common_probe(struct platform_device *pdev, >>> pcie_bus_configure_settings(child); >>> } >>> >>> + for (i=0;i<20000; i++) >>> + udelay(1000); >>> pci_bus_add_devices(bus); >>> return 0; >>> } >>> >>> And migrate at this delay time, It must be failed. >>> >>> This patchset try to fix this problem. >>> >>> BTW: This patchset just a demo, haven't do more test, and the implement maybe a little evil. >>> >>> Thanks >>> >>> >>> >>> wanghaibin (3): >>> kvm: arm/arm64: vgic-vits: separate vgic_its_free_list() function >>> kvm: arm/arm64: vgic-vits: free its resource when vm reboot/reset >>> kvm: arm/arm64: vgic-its: fix return value for restore >>> >>> virt/kvm/arm/arm.c | 5 ++++- >>> virt/kvm/arm/vgic/vgic-its.c | 26 +++++++++++++++++++------- >>> virt/kvm/arm/vgic/vgic.h | 1 + >>> 3 files changed, 24 insertions(+), 8 deletions(-) >>> >> >> >> >> _______________________________________________ >> kvmarm mailing list >> kvmarm@xxxxxxxxxxxxxxxxxxxxx >> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm >> > > . > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm