Hi James, Thanks for the comments. 2018-04-10 22:15 GMT+08:00, James Morse <james.morse@xxxxxxx>: > Hi Dongjiu Geng, > > On 09/04/18 22:36, Dongjiu Geng wrote: >> This new IOCTL exports user-invisible states related to SError. >> Together with appropriate user space changes, it can inject >> SError with specified syndrome to guest by setup kvm_vcpu_events >> value. > >> Also it can support live migration. > > Could you explain what user-space is expected to do for this? > (this is also relevant for snapshot-ing/suspending VMs) Ok. > > It's probably worth noting that this solves an existing problem: KVM may > make an > SError pending, but user-space has no way to discover/migrate this. if KVM make an SError pending, when user-space do migration, it get the kvm_vcpu_events through KVM_GET_VCPU_EVENTS, then can find that pending status. What are the things you're worried about? > > >> diff --git a/Documentation/virtual/kvm/api.txt >> b/Documentation/virtual/kvm/api.txt >> index 8a3d708..45719b4 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -819,11 +819,13 @@ struct kvm_clock_data { >> >> Capability: KVM_CAP_VCPU_EVENTS >> Extended by: KVM_CAP_INTR_SHADOW >> -Architectures: x86 >> +Architectures: x86, arm, arm64 >> Type: vm ioctl >> Parameters: struct kvm_vcpu_event (out) >> Returns: 0 on success, -1 on error >> >> +X86: >> + >> Gets currently pending exceptions, interrupts, and NMIs as well as >> related >> states of the vcpu. >> >> @@ -865,15 +867,31 @@ Only two fields are defined in the flags field: >> - KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that >> smi contains a valid state. >> >> +ARM, ARM64: >> + >> +Gets currently pending SError exceptions as well as related states of the >> vcpu. >> + >> +struct kvm_vcpu_events { >> + struct { >> + __u8 serror_pending; >> + __u8 serror_has_esr; >> + /* Align it to 4 bytes */ >> + __u8 pad[2]; >> + __u64 serror_esr; >> + } exception; >> +}; >> + > > I'm not convinced we should change this struct from the layout/size x86 has. > Its > confusing for the documentation, is this API call really the same on all > architectures? > > What if we want to add some future interrupt, NMI or related state? We've > found > ourselves needing to add this API, it seems odd to remove its other uses on > x86. > We can't put them back in the future. > > Having a different layout would force user-space to ifdef/duplicate any > code > that accesses this between architectures. In x86 and arm64 user space code, the handling logic of KVM_GET/SET_VCPU_EVENTS is in different ARCH folder, maybe it is not necessary to share the handling code in the user space. > > > > The compiler will want that __u64 to be naturally aligned to 8-bytes, so > your > 4-byte padding still causes some secret compiler-padding to be inserted. > Different versions of the compiler may put it in different places. > > >> 4.32 KVM_SET_VCPU_EVENTS >> >> Capability: KVM_CAP_VCPU_EVENTS >> Extended by: KVM_CAP_INTR_SHADOW >> -Architectures: x86 >> +Architectures: x86, arm, arm64 >> Type: vm ioctl >> Parameters: struct kvm_vcpu_event (in) >> Returns: 0 on success, -1 on error >> >> +X86: >> + >> Set pending exceptions, interrupts, and NMIs as well as related states of >> the >> vcpu. >> >> @@ -894,6 +912,12 @@ shall be written into the VCPU. >> >> KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. >> >> +ARM, ARM64: >> + >> +Set pending SError exceptions as well as related states of the vcpu. >> + >> +See KVM_GET_VCPU_EVENTS for the data structure. >> + >> >> 4.33 KVM_GET_DEBUGREGS >> > > >> diff --git a/arch/arm64/include/uapi/asm/kvm.h >> b/arch/arm64/include/uapi/asm/kvm.h >> index 9abbf30..855cc9a 100644 >> --- a/arch/arm64/include/uapi/asm/kvm.h >> +++ b/arch/arm64/include/uapi/asm/kvm.h >> @@ -39,6 +39,7 @@ >> #define __KVM_HAVE_GUEST_DEBUG >> #define __KVM_HAVE_IRQ_LINE >> #define __KVM_HAVE_READONLY_MEM >> +#define __KVM_HAVE_VCPU_EVENTS >> >> #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 >> >> @@ -153,6 +154,17 @@ struct kvm_sync_regs { >> struct kvm_arch_memory_slot { >> }; >> >> +/* for KVM_GET/SET_VCPU_EVENTS */ >> +struct kvm_vcpu_events { >> + struct { >> + __u8 serror_pending; >> + __u8 serror_has_esr; > >> + /* Align it to 4 bytes */ >> + __u8 pad[2]; > > (padding noted above) > > >> + __u64 serror_esr; >> + } exception; >> +}; >> + >> /* If you need to interpret the index values, here is the key: */ >> #define KVM_REG_ARM_COPROC_MASK 0x000000000FFF0000 >> #define KVM_REG_ARM_COPROC_SHIFT 16 > > >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c >> index 5c7f657..42e1222 100644 >> --- a/arch/arm64/kvm/guest.c >> +++ b/arch/arm64/kvm/guest.c >> @@ -277,6 +277,37 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu >> *vcpu, >> return -EINVAL; >> } >> >> +int kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu, >> + struct kvm_vcpu_events *events) >> +{ >> + events->exception.serror_pending = (vcpu_get_hcr(vcpu) & HCR_VSE); >> + events->exception.serror_has_esr = >> + cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && >> + (!!vcpu_get_vsesr(vcpu)); > >> + events->exception.serror_esr = vcpu_get_vsesr(vcpu); > > This will return a stale ESR even if nothing is pending. On systems without > the > RAS extensions it will return 'ESR_ELx_ISV' if kvm_inject_vabt() has ever > been > called for this CPU. > > Could we hide this behind (pending && has_esr), setting it to 0 otherwise. > This > is just to avoid exposing the stale value. Exactly, it is indeed. > > >> + >> + return 0; >> +} > >> +int kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, >> + struct kvm_vcpu_events *events) >> +{ >> + bool injected = events->exception.serror_pending; >> + bool has_esr = events->exception.serror_has_esr; >> + >> + if (injected && has_esr) { >> + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) >> + return -EINVAL; >> + >> + kvm_set_sei_esr(vcpu, events->exception.serror_esr); >> + >> + } else if (injected) { >> + kvm_inject_vabt(vcpu); > > Nit: looks like 'injected' is misnamed. "injected" change to "pending"? > > >> + } >> + >> + return 0; >> +} >> + >> int __attribute_const__ kvm_target_cpu(void) >> { >> unsigned long implementor = read_cpuid_implementor(); > > >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c >> index 38c8a64..20e919a 100644 >> --- a/arch/arm64/kvm/reset.c >> +++ b/arch/arm64/kvm/reset.c >> @@ -82,6 +82,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, >> long ext) >> break; >> case KVM_CAP_SET_GUEST_DEBUG: >> case KVM_CAP_VCPU_ATTRIBUTES: >> + case KVM_CAP_VCPU_EVENTS: >> r = 1; >> break; >> default: > >> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >> index 7e3941f..945655d 100644 >> --- a/virt/kvm/arm/arm.c >> +++ b/virt/kvm/arm/arm.c >> @@ -1051,6 +1051,27 @@ long kvm_arch_vcpu_ioctl(struct file *filp, >> return -EFAULT; >> return kvm_arm_vcpu_has_attr(vcpu, &attr); >> } >> + case KVM_GET_VCPU_EVENTS: { >> + struct kvm_vcpu_events events; >> + >> + memset(&events, 0, sizeof(struct kvm_vcpu_events)); > > sizeof(events) is the normal style here, it means if someone changes > event's > type we don't get any surprises... Ok, thanks > > >> + if (kvm_arm_vcpu_get_events(vcpu, &events)) >> + return -EINVAL; >> + >> + if (copy_to_user(argp, &events, sizeof(struct kvm_vcpu_events))) > > sizeof(events) thanks > > >> + return -EFAULT; >> + >> + return 0; >> + } >> + case KVM_SET_VCPU_EVENTS: { >> + struct kvm_vcpu_events events; >> + >> + if (copy_from_user(&events, argp, >> + sizeof(struct kvm_vcpu_events))) >> + return -EFAULT; >> + >> + return kvm_arm_vcpu_set_events(vcpu, &events); >> + } >> default: >> return -EINVAL; >> } >> > > Despite KVM_CAP_VCPU_EVENTS not being advertised on 32bit, any attempt to > call > it will still end up in here, but will always fail as the {g,s}et_events() > calls > always return -EINVAL. I don't think this will cause us any problems. What are the things you're worried about? > > > Thanks, > > James > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html