On Sat, Apr 9, 2022 at 11:46 AM Oliver Upton <oupton@xxxxxxxxxx> wrote: > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > software to request that a system be placed in the deepest possible > low-power state. Effectively, software can use this to suspend itself to > RAM. > > Unfortunately, there really is no good way to implement a system-wide > PSCI call in KVM. Any precondition checks done in the kernel will need > to be repeated by userspace since there is no good way to protect a > critical section that spans an exit to userspace. SYSTEM_RESET and > SYSTEM_OFF are equally plagued by this issue, although no users have > seemingly cared for the relatively long time these calls have been > supported. > > The solution is to just make the whole implementation userspace's > problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that > indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND. > Additionally, add a CAP to get buy-in from userspace for this new exit > type. > > Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in. > If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide > explicit documentation of userspace's responsibilites for the exit and > point to the PSCI specification to describe the actual PSCI call. > > Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx> > --- > Documentation/virt/kvm/api.rst | 39 +++++++++++++++++++++++++++++++ > arch/arm64/include/asm/kvm_host.h | 3 ++- > arch/arm64/kvm/arm.c | 12 +++++++++- > arch/arm64/kvm/psci.c | 25 ++++++++++++++++++++ > include/uapi/linux/kvm.h | 2 ++ > 5 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index d104e34ad703..24e2fac2fea7 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6015,6 +6015,7 @@ should put the acknowledged interrupt vector into the 'epr' field. > #define KVM_SYSTEM_EVENT_RESET 2 > #define KVM_SYSTEM_EVENT_CRASH 3 > #define KVM_SYSTEM_EVENT_WAKEUP 4 > + #define KVM_SYSTEM_EVENT_SUSPENDED 5 Nit: This should be KVM_SYSTEM_EVENT_SUSPEND based on the code. (a few more parts in the doc use KVM_SYSTEM_EVENT_SUSPENDED) Otherwise, Reviewed-by: Reiji Watanabe <reijiw@xxxxxxxxxx> Thanks, Reiji > __u32 type; > __u64 flags; > } system_event; > @@ -6042,6 +6043,34 @@ Valid values for 'type' are: > - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and > KVM has recognized a wakeup event. Userspace may honor this event by > marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. > + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of > + the VM. > + > +For arm/arm64: > +^^^^^^^^^^^^^^ > + > + KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the > + KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest invokes the PSCI > + SYSTEM_SUSPEND function, KVM will exit to userspace with this event > + type. > + > + It is the sole responsibility of userspace to implement the PSCI > + SYSTEM_SUSPEND call according to ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND". > + KVM does not change the vCPU's state before exiting to userspace, so > + the call parameters are left in-place in the vCPU registers. > + > + Userspace is _required_ to take action for such an exit. It must > + either: > + > + - Honor the guest request to suspend the VM. Userspace can request > + in-kernel emulation of suspension by setting the calling vCPU's > + state to KVM_MP_STATE_SUSPENDED. Userspace must configure the vCPU's > + state according to the parameters passed to the PSCI function when > + the calling vCPU is resumed. See ARM DEN0022D.b 5.19.1 "Intended use" > + for details on the function parameters. > + > + - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 > + "Caller responsibilities" for possible return values. > > Valid flags are: > > @@ -7756,6 +7785,16 @@ At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting > this capability will disable PMU virtualization for that VM. Usermode > should adjust CPUID leaf 0xA to reflect that the PMU is disabled. > > +8.36 KVM_CAP_ARM_SYSTEM_SUSPEND > +------------------------------- > + > +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND > +:Architectures: arm64 > +:Type: vm > + > +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of > +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request. > + > 9. Known KVM API problems > ========================= > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index 46027b9b80ca..9243115c9d7b 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -137,7 +137,8 @@ struct kvm_arch { > */ > #define KVM_ARCH_FLAG_REG_WIDTH_CONFIGURED 3 > #define KVM_ARCH_FLAG_EL1_32BIT 4 > - > + /* PSCI SYSTEM_SUSPEND enabled for the guest */ > +#define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED 5 > unsigned long flags; > > /* > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index e9641b86d375..1714aa55db9c 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -97,6 +97,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > } > mutex_unlock(&kvm->lock); > break; > + case KVM_CAP_ARM_SYSTEM_SUSPEND: > + r = 0; > + set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags); > + break; > default: > r = -EINVAL; > break; > @@ -210,6 +214,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_SET_GUEST_DEBUG: > case KVM_CAP_VCPU_ATTRIBUTES: > case KVM_CAP_PTP_KVM: > + case KVM_CAP_ARM_SYSTEM_SUSPEND: > r = 1; > break; > case KVM_CAP_SET_GUEST_DEBUG2: > @@ -447,8 +452,13 @@ bool kvm_arm_vcpu_stopped(struct kvm_vcpu *vcpu) > static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu) > { > vcpu->arch.mp_state.mp_state = KVM_MP_STATE_SUSPENDED; > + > + /* > + * Since this is only called from the intended vCPU, the target vCPU is > + * guaranteed to not be running. As such there is no need to kick the > + * target to handle the request. > + */ > kvm_make_request(KVM_REQ_SUSPEND, vcpu); > - kvm_vcpu_kick(vcpu); > } > > static bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu) > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > index 362d2a898b83..58b5e2c2ff6a 100644 > --- a/arch/arm64/kvm/psci.c > +++ b/arch/arm64/kvm/psci.c > @@ -191,6 +191,11 @@ static void kvm_psci_system_reset2(struct kvm_vcpu *vcpu) > KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2); > } > > +static void kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > +{ > + kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND, 0); > +} > + > static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu) > { > int i; > @@ -296,6 +301,7 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) > { > unsigned long val = PSCI_RET_NOT_SUPPORTED; > u32 psci_fn = smccc_get_function(vcpu); > + struct kvm *kvm = vcpu->kvm; > u32 arg; > int ret = 1; > > @@ -327,6 +333,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) > case ARM_SMCCC_VERSION_FUNC_ID: > val = 0; > break; > + case PSCI_1_0_FN_SYSTEM_SUSPEND: > + case PSCI_1_0_FN64_SYSTEM_SUSPEND: > + if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags)) > + val = 0; > + break; > case PSCI_1_1_FN_SYSTEM_RESET2: > case PSCI_1_1_FN64_SYSTEM_RESET2: > if (minor >= 1) > @@ -334,6 +345,20 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) > break; > } > break; > + case PSCI_1_0_FN_SYSTEM_SUSPEND: > + kvm_psci_narrow_to_32bit(vcpu); > + fallthrough; > + case PSCI_1_0_FN64_SYSTEM_SUSPEND: > + /* > + * Return directly to userspace without changing the vCPU's > + * registers. Userspace depends on reading the SMCCC parameters > + * to implement SYSTEM_SUSPEND. > + */ > + if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags)) { > + kvm_psci_system_suspend(vcpu); > + return 0; > + } > + break; > case PSCI_1_1_FN_SYSTEM_RESET2: > kvm_psci_narrow_to_32bit(vcpu); > fallthrough; > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 64e5f9d83a7a..752e4a5c3ce6 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -445,6 +445,7 @@ struct kvm_run { > #define KVM_SYSTEM_EVENT_RESET 2 > #define KVM_SYSTEM_EVENT_CRASH 3 > #define KVM_SYSTEM_EVENT_WAKEUP 4 > +#define KVM_SYSTEM_EVENT_SUSPEND 5 > __u32 type; > __u64 flags; > } system_event; > @@ -1146,6 +1147,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_S390_MEM_OP_EXTENSION 211 > #define KVM_CAP_PMU_CAPABILITY 212 > #define KVM_CAP_DISABLE_QUIRKS2 213 > +#define KVM_CAP_ARM_SYSTEM_SUSPEND 214 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- > 2.35.1.1178.g4f1659d476-goog >