On Fri, Apr 03, 2020 at 12:20:26PM +1000, Nicholas Piggin wrote: > Gautham R. Shenoy's on March 31, 2020 10:10 pm: > > From: "Gautham R. Shenoy" <ego@xxxxxxxxxxxxxxxxxx> > > > > ISA v3.0 allows the guest to execute a stop instruction. For this, the > > PSSCR[ESL|EC] bits need to be cleared by the hypervisor before > > scheduling in the guest vCPU. > > > > Currently we always schedule in a vCPU with PSSCR[ESL|EC] bits > > set. This patch changes the behaviour to enter the guest with > > PSSCR[ESL|EC] bits cleared. This is a RFC patch where we > > unconditionally clear these bits. Ideally this should be done > > conditionally on platforms where the guest stop instruction has no > > Bugs (starting POWER9 DD2.3). > > How will guests know that they can use this facility safely after your > series? You need both DD2.3 and a patched KVM. Yes, this is something that isn't addressed in this series (mentioned in the cover letter), which is a POC demonstrating that the stop0lite state in guest works. However, to answer your question, this is the scheme that I had in mind : OPAL: On Procs >= DD2.3 : we publish a dt-cpu-feature "idle-stop-guest" Hypervisor Kernel: 1. If "idle-stop-guest" dt-cpu-feature is discovered, then we set bool enable_guest_stop = true; 2. During KVM guest entry, clear PSSCR[ESL|EC] iff enable_guest_stop == true. 3. In kvm_vm_ioctl_check_extension(), for a new capability KVM_CAP_STOP, return true iff enable_guest_top == true. QEMU: Check with the hypervisor if KVM_CAP_STOP is present. If so, indicate the presence to the guest via device tree. Guest Kernel: Check for the presence of guest stop state support in device-tree. If available, enable the stop0lite in the cpuidle driver. We still have a challenge of migrating a guest which started on a hypervisor supporting guest stop state to a hypervisor without it. The target hypervisor should atleast have Patch 1 of this series, so that we don't crash the guest. > > > > > Signed-off-by: Gautham R. Shenoy <ego@xxxxxxxxxxxxxxxxxx> > > --- > > arch/powerpc/kvm/book3s_hv.c | 2 +- > > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 25 +++++++++++++------------ > > 2 files changed, 14 insertions(+), 13 deletions(-) > > > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > > index cdb7224..36d059a 100644 > > --- a/arch/powerpc/kvm/book3s_hv.c > > +++ b/arch/powerpc/kvm/book3s_hv.c > > @@ -3424,7 +3424,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit, > > mtspr(SPRN_IC, vcpu->arch.ic); > > mtspr(SPRN_PID, vcpu->arch.pid); > > > > - mtspr(SPRN_PSSCR, vcpu->arch.psscr | PSSCR_EC | > > + mtspr(SPRN_PSSCR, (vcpu->arch.psscr & ~(PSSCR_EC | PSSCR_ESL)) | > > (local_paca->kvm_hstate.fake_suspend << PSSCR_FAKE_SUSPEND_LG)); > > > > mtspr(SPRN_HFSCR, vcpu->arch.hfscr); > > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > index dbc2fec..c2daec3 100644 > > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S > > @@ -823,6 +823,18 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S) > > mtspr SPRN_PID, r7 > > mtspr SPRN_WORT, r8 > > BEGIN_FTR_SECTION > > + /* POWER9-only registers */ > > + ld r5, VCPU_TID(r4) > > + ld r6, VCPU_PSSCR(r4) > > + lbz r8, HSTATE_FAKE_SUSPEND(r13) > > + lis r7, (PSSCR_EC | PSSCR_ESL)@h /* Allow guest to call stop */ > > + andc r6, r6, r7 > > + rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > + ld r7, VCPU_HFSCR(r4) > > + mtspr SPRN_TIDR, r5 > > + mtspr SPRN_PSSCR, r6 > > + mtspr SPRN_HFSCR, r7 > > +FTR_SECTION_ELSE > > Why did you move these around? Just because the POWER9 section became > larger than the other? Yes. > > That's a real wart in the instruction patching implementation, I think > we can fix it by padding with nops in the macros. > > Can you just add the additional required nops to the top branch without > changing them around for this patch, so it's easier to see what's going > on? The end result will be the same after patching. Actually changing > these around can have a slight unintended consequence in that code that > runs before features were patched will execute the IF code. Not a > problem here, but another reason why the instruction patching > restriction is annoying. Sure, I will repost this patch with additional nops instead of moving them around. > > Thanks, > Nick > > > /* POWER8-only registers */ > > ld r5, VCPU_TCSCR(r4) > > ld r6, VCPU_ACOP(r4) > > @@ -833,18 +845,7 @@ BEGIN_FTR_SECTION > > mtspr SPRN_CSIGR, r7 > > mtspr SPRN_TACR, r8 > > nop > > -FTR_SECTION_ELSE > > - /* POWER9-only registers */ > > - ld r5, VCPU_TID(r4) > > - ld r6, VCPU_PSSCR(r4) > > - lbz r8, HSTATE_FAKE_SUSPEND(r13) > > - oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */ > > - rldimi r6, r8, PSSCR_FAKE_SUSPEND_LG, 63 - PSSCR_FAKE_SUSPEND_LG > > - ld r7, VCPU_HFSCR(r4) > > - mtspr SPRN_TIDR, r5 > > - mtspr SPRN_PSSCR, r6 > > - mtspr SPRN_HFSCR, r7 > > -ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300) > > +ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300) > > 8: > > > > ld r5, VCPU_SPRG0(r4) > > -- > > 1.9.4 > > > > -- Thanks and Regards gautham.