Re: [RFC PATCH 20/32] KVM: PPC: Book3S HV: Nested guest entry via hypercall

David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> · Thu, 27 Sep 2018 10:57:17 +1000

On Wed, Sep 26, 2018 at 08:59:22PM +1000, Paul Mackerras wrote:
> On Wed, Sep 26, 2018 at 03:41:07PM +1000, David Gibson wrote:
> > On Fri, Sep 21, 2018 at 08:01:51PM +1000, Paul Mackerras wrote:
> > > This adds a new hypercall, H_ENTER_NESTED, which is used by a nested
> > > hypervisor to enter one of its nested guests.  The hypercall supplies
> > > register values in two structs.  Those values are copied by the level 0
> > > (L0) hypervisor (the one which is running in hypervisor mode) into the
> > > vcpu struct of the L1 guest, and then the guest is run until an
> > > interrupt or error occurs which needs to be reported to L1 via the
> > > hypercall return value.
> > > 
> > > Currently this assumes that the L0 and L1 hypervisors are the same
> > > endianness, and the structs passed as arguments are in native
> > > endianness.
> > 
> > That's nasty.  It'd be good to at least detect this and bail.
> 
> OK, so I need to reword the commit message, because we do detect this
> via the version number check, and reject the hcall in the cross-endian
> case.

Ah, ok.

> > >  	case H_ENTER_NESTED:
> > >  		ret = H_FUNCTION;
> > > +		if (!vcpu->kvm->arch.nested_enable)
> > > +			break;
> > 
> > Wouldn't H_AUTHORITY make more sense than H_FUNCTION for the
> > no-nested-allowed case?
> 
> So the guest can distinguish "I don't know about nested virt" from "I
> know about it but you're not allowed to use it", you mean?  I'm not
> sure it makes much difference in practice.

I guess not.  I think I'm just used to avoiding H_FUNCTION in qemu for
anything except "we know nothing about this hypercall whatsoever".

> > > +		ret = kvmhv_enter_nested_guest(vcpu);
> > > +		if (ret == H_INTERRUPT) {
> > > +			kvmppc_set_gpr(vcpu, 3, 0);
> > > +			return RESUME_HOST;
> > > +		}
> > >  		break;
> > >  
> > >  	default:
> > > @@ -1269,6 +1276,104 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
> > >  	return r;
> > >  }
> > >  
> > > +static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
> > 
> > Might be nice to rename to make it clear if this is the L0 or L1
> > handling for a nested exit.
> 
> Well, it's Ln handling the exit of Ln+2.  Got a suggestion for a
> better name?

Yeah, as I said elsewhere feels like we need better terms for absolute
vs. relative hypervisor levels.  What about
kvmppc_handle_guestguest_exit() or kvmppc_handle_metaguest_exit()?
That would work best if "guestguest" or "metaguest" was used to refer
to Ln+2 throughout, of course.

> > > @@ -4462,8 +4600,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> > >  	 * On POWER9, we only need to do this if the "indep_threads_mode"
> > >  	 * module parameter has been set to N.
> > >  	 */
> > > -	if (cpu_has_feature(CPU_FTR_ARCH_300))
> > > -		kvm->arch.threads_indep = indep_threads_mode;
> > > +	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > > +		if (!indep_threads_mode && !cpu_has_feature(CPU_FTR_HVMODE)) {
> > > +			pr_warn("KVM: Ignoring indep_threads_mode=N in nested hypervisor\n");
> > > +			kvm->arch.threads_indep = true;
> > 
> > Wouldn't it be cleaner to enforce this at the point indep_threads_mode
> > is set, rather than altering the value here?
> 
> I'm not sure if we get notified when the administrator changes the
> value via sysfs.  In any case, doing it this way causes a warning
> every time you start a VM, which is more likely to get noticed than a
> single warning in the middle of all the boot-time messages.

Hm, good point.

> > > +long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
> > > +{
> > > +	long int err, r;
> > > +	struct kvm_nested_guest *l2;
> > > +	struct pt_regs l2_regs, saved_l1_regs;
> > > +	struct hv_guest_state l2_hv, saved_l1_hv;
> > > +	struct kvmppc_vcore *vc = vcpu->arch.vcore;
> > > +	u64 hv_ptr, regs_ptr;
> > > +	u64 hdec_exp;
> > > +	s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
> > > +	u64 mask;
> > > +
> > > +	if (!kvm_is_radix(vcpu->kvm))
> > > +		return H_FUNCTION;
> > 
> > Would it be safer / cleaner to have this instead check that the L1 has
> > completed an H_SET_PARTITION_TABLE?  Which wouldn't be allowed for an
> > HPT guest.
> 
> There is no valid bit in the PTCR value, and the PTCR could have been
> set via the one-reg interface without there having been a
> H_SET_PARTITION_TABLE in the lifetime of this KVM instance (for
> example when the L1 guest has been migrated in from another machine),
> so I don't see a foolproof way to do what you suggest.

Ah, right.

Would checking if the vPTCR is != it's initial value (0?) do the job?
(AFAICT PTCR==0 would be unusable, even if technically allowed).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Attachment:
signature.asc

Description: PGP signature