Hi Marc, Thanks for reviewing the series. ACK to the nits and smaller comments you've made, I'll incorporate that feedback in the next series. On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote: > On Wed, 23 Feb 2022 04:18:34 +0000, > Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows > > software to request that a system be placed in the deepest possible > > low-power state. Effectively, software can use this to suspend itself to > > RAM. Note that the semantics of this PSCI call are very similar to > > CPU_SUSPEND, which is already implemented in KVM. > > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the > > low-power state is implemented as a guest WFI. Synchronously reset the > > calling CPU before entering the WFI, such that the vCPU may immediately > > resume execution when a wakeup event is recognized. > > > > Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx> > > --- > > arch/arm64/kvm/psci.c | 51 ++++++++++++++++++++++++++++++++++++++++++ > > arch/arm64/kvm/reset.c | 3 ++- > > 2 files changed, 53 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c > > index 77a00913cdfd..41adaaf2234a 100644 > > --- a/arch/arm64/kvm/psci.c > > +++ b/arch/arm64/kvm/psci.c > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) > > kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); > > } > > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu) > > +{ > > + struct vcpu_reset_state reset_state; > > + struct kvm *kvm = vcpu->kvm; > > + struct kvm_vcpu *tmp; > > + bool denied = false; > > + unsigned long i; > > + > > + reset_state.pc = smccc_get_arg1(vcpu); > > + if (!kvm_ipa_valid(kvm, reset_state.pc)) { > > + smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0); > > + return 1; > > + } > > + > > + reset_state.r0 = smccc_get_arg2(vcpu); > > + reset_state.be = kvm_vcpu_is_be(vcpu); > > + reset_state.reset = true; > > + > > + /* > > + * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the > > + * calling vCPU) be in an OFF state, as determined by the > > + * implementation. > > + * > > + * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details. > > + */ > > + mutex_lock(&kvm->lock); > > + kvm_for_each_vcpu(i, tmp, kvm) { > > + if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) { > > + denied = true; > > + break; > > + } > > + } > > + mutex_unlock(&kvm->lock); > > This looks dodgy. Nothing seems to prevent userspace from setting the > mp_state to RUNNING in parallel with this, as only the vcpu mutex is > held when this ioctl is issued. > > It looks to me that what you want is what lock_all_vcpus() does > (Alexandru has a patch moving it out of the vgic code as part of his > SPE series). > > It is also pretty unclear what the interaction with userspace is once > you have released the lock. If the VMM starts a vcpu other than the > suspending one, what is its state? The spec doesn't see to help > here. I can see two options: > > - either all the vcpus have the same reset state applied to them as > they come up, unless they are started with CPU_ON by a vcpu that has > already booted (but there is a single 'context_id' provided, and I > fear this is going to confuse the OS)... > > - or only the suspending vcpu can resume the system, and we must fail > a change of mp_state for the other vcpus. > > What do you think? Definitely the latter. The documentation of SYSTEM_SUSPEND is quite shaky on this, but it would appear that the intention is for the caller to be the first CPU to wake up. > > + > > + if (denied) { > > + smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0); > > + return 1; > > + } > > + > > + __kvm_reset_vcpu(vcpu, &reset_state); > > + kvm_vcpu_wfi(vcpu); > > I have mixed feelings about this. The vcpu has reset before being in > WFI, while it really should be the other way around and userspace > could rely on observing the transition. > > What breaks if you change this? I don't think that userspace would be able to observe the transition even if we WFI before the reset. I imagine that would take the form of setting KVM_REQ_VCPU_RESET, which we explicitly handle before letting userspace access the vCPU's state as of commit 6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace touches vCPU state"). Given this, I felt it was probably best to avoid all the indirection and just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does, however, imply that we have slightly different behavior when userspace exits are enabled, as that will happen pre-reset and pre-WFI. -- Oliver