Nicholas Piggin <npiggin@xxxxxxxxx> writes: > Excerpts from Fabiano Rosas's message of March 6, 2021 9:10 am: >> As one of the arguments of the H_ENTER_NESTED hypercall, the nested >> hypervisor (L1) prepares a structure containing the values of various >> hypervisor-privileged registers with which it wants the nested guest >> (L2) to run. Since the nested HV runs in supervisor mode it needs the >> host to write to these registers. >> >> To stop a nested HV manipulating this mechanism and using a nested >> guest as a proxy to access a facility that has been made unavailable >> to it, we have a routine that sanitises the values of the HV registers >> before copying them into the nested guest's vcpu struct. >> >> However, when coming out of the guest the values are copied as they >> were back into L1 memory, which means that any sanitisation we did >> during guest entry will be exposed to L1 after H_ENTER_NESTED returns. >> >> This is not a problem by itself, but in the case of the Hypervisor >> Facility Status and Control Register (HFSCR), we use the intersection >> between L2 hfscr bits and L1 hfscr bits. That means that L1 could use >> this to indirectly read the (hv-privileged) value from its vcpu >> struct. >> >> This patch fixes this by making sure that L1 only gets back the bits >> that are necessary for regular functioning. > > The general idea of restricting exposure of HV privileged bits, but > for the case of HFSCR a guest can probe the HFCR anyway by testing which > facilities are available (and presumably an HV may need some way to know > what features are available for it to advertise to its own guests), so > is this necessary? Perhaps a comment would be sufficient. > Well, I'd be happy to force them through the arduous path then =); and there are features that are emulated by the HV which L1 would not be able to probe. I think we should implement a mechanism that stops all leaks now, rather than having to ponder about this every time we touch an hv_reg in that structure. I'm not too worried about HFSCR specifically. Let me think about this some more and see if I can make it more generic, I realise that sticking the saved_hfscr on the side is not the most elegant approach. > Thanks, > Nick > >> >> Signed-off-by: Fabiano Rosas <farosas@xxxxxxxxxxxxx> >> --- >> arch/powerpc/kvm/book3s_hv_nested.c | 22 +++++++++++++++++----- >> 1 file changed, 17 insertions(+), 5 deletions(-) >> >> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c >> index 0cd0e7aad588..860004f46e08 100644 >> --- a/arch/powerpc/kvm/book3s_hv_nested.c >> +++ b/arch/powerpc/kvm/book3s_hv_nested.c >> @@ -98,12 +98,20 @@ static void byteswap_hv_regs(struct hv_guest_state *hr) >> } >> >> static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap, >> - struct hv_guest_state *hr) >> + struct hv_guest_state *hr, u64 saved_hfscr) >> { >> struct kvmppc_vcore *vc = vcpu->arch.vcore; >> >> + /* >> + * During sanitise_hv_regs() we used HFSCR bits from L1 state >> + * to restrict what the L2 state is allowed to be. Since L1 is >> + * not allowed to read this SPR, do not include these >> + * modifications in the return state. >> + */ >> + hr->hfscr = ((~HFSCR_INTR_CAUSE & saved_hfscr) | >> + (HFSCR_INTR_CAUSE & vcpu->arch.hfscr)); >> + >> hr->dpdes = vc->dpdes; >> - hr->hfscr = vcpu->arch.hfscr; >> hr->purr = vcpu->arch.purr; >> hr->spurr = vcpu->arch.spurr; >> hr->ic = vcpu->arch.ic; >> @@ -132,12 +140,14 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap, >> } >> } >> >> -static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr) >> +static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr, >> + u64 *saved_hfscr) >> { >> /* >> * Don't let L1 enable features for L2 which we've disabled for L1, >> * but preserve the interrupt cause field. >> */ >> + *saved_hfscr = hr->hfscr; >> hr->hfscr &= (HFSCR_INTR_CAUSE | vcpu->arch.hfscr); >> >> /* Don't let data address watchpoint match in hypervisor state */ >> @@ -272,6 +282,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) >> u64 hdec_exp; >> s64 delta_purr, delta_spurr, delta_ic, delta_vtb; >> u64 mask; >> + u64 hfscr; >> unsigned long lpcr; >> >> if (vcpu->kvm->arch.l1_ptcr == 0) >> @@ -324,7 +335,8 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) >> mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD | >> LPCR_LPES | LPCR_MER; >> lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask); >> - sanitise_hv_regs(vcpu, &l2_hv); >> + >> + sanitise_hv_regs(vcpu, &l2_hv, &hfscr); >> restore_hv_regs(vcpu, &l2_hv); >> >> vcpu->arch.ret = RESUME_GUEST; >> @@ -345,7 +357,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) >> delta_spurr = vcpu->arch.spurr - l2_hv.spurr; >> delta_ic = vcpu->arch.ic - l2_hv.ic; >> delta_vtb = vc->vtb - l2_hv.vtb; >> - save_hv_return_state(vcpu, vcpu->arch.trap, &l2_hv); >> + save_hv_return_state(vcpu, vcpu->arch.trap, &l2_hv, hfscr); >> >> /* restore L1 state */ >> vcpu->arch.nested = NULL; >> -- >> 2.29.2 >> >>