On Wed, Sep 26, 2018 at 08:59:22PM +1000, Paul Mackerras wrote: > On Wed, Sep 26, 2018 at 03:41:07PM +1000, David Gibson wrote: > > On Fri, Sep 21, 2018 at 08:01:51PM +1000, Paul Mackerras wrote: > > > This adds a new hypercall, H_ENTER_NESTED, which is used by a nested > > > hypervisor to enter one of its nested guests. The hypercall supplies > > > register values in two structs. Those values are copied by the level 0 > > > (L0) hypervisor (the one which is running in hypervisor mode) into the > > > vcpu struct of the L1 guest, and then the guest is run until an > > > interrupt or error occurs which needs to be reported to L1 via the > > > hypercall return value. > > > > > > Currently this assumes that the L0 and L1 hypervisors are the same > > > endianness, and the structs passed as arguments are in native > > > endianness. > > > > That's nasty. It'd be good to at least detect this and bail. > > OK, so I need to reword the commit message, because we do detect this > via the version number check, and reject the hcall in the cross-endian > case. Ah, ok. > > > case H_ENTER_NESTED: > > > ret = H_FUNCTION; > > > + if (!vcpu->kvm->arch.nested_enable) > > > + break; > > > > Wouldn't H_AUTHORITY make more sense than H_FUNCTION for the > > no-nested-allowed case? > > So the guest can distinguish "I don't know about nested virt" from "I > know about it but you're not allowed to use it", you mean? I'm not > sure it makes much difference in practice. I guess not. I think I'm just used to avoiding H_FUNCTION in qemu for anything except "we know nothing about this hypercall whatsoever". > > > + ret = kvmhv_enter_nested_guest(vcpu); > > > + if (ret == H_INTERRUPT) { > > > + kvmppc_set_gpr(vcpu, 3, 0); > > > + return RESUME_HOST; > > > + } > > > break; > > > > > > default: > > > @@ -1269,6 +1276,104 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu, > > > return r; > > > } > > > > > > +static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu) > > > > Might be nice to rename to make it clear if this is the L0 or L1 > > handling for a nested exit. > > Well, it's Ln handling the exit of Ln+2. Got a suggestion for a > better name? Yeah, as I said elsewhere feels like we need better terms for absolute vs. relative hypervisor levels. What about kvmppc_handle_guestguest_exit() or kvmppc_handle_metaguest_exit()? That would work best if "guestguest" or "metaguest" was used to refer to Ln+2 throughout, of course. > > > @@ -4462,8 +4600,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm) > > > * On POWER9, we only need to do this if the "indep_threads_mode" > > > * module parameter has been set to N. > > > */ > > > - if (cpu_has_feature(CPU_FTR_ARCH_300)) > > > - kvm->arch.threads_indep = indep_threads_mode; > > > + if (cpu_has_feature(CPU_FTR_ARCH_300)) { > > > + if (!indep_threads_mode && !cpu_has_feature(CPU_FTR_HVMODE)) { > > > + pr_warn("KVM: Ignoring indep_threads_mode=N in nested hypervisor\n"); > > > + kvm->arch.threads_indep = true; > > > > Wouldn't it be cleaner to enforce this at the point indep_threads_mode > > is set, rather than altering the value here? > > I'm not sure if we get notified when the administrator changes the > value via sysfs. In any case, doing it this way causes a warning > every time you start a VM, which is more likely to get noticed than a > single warning in the middle of all the boot-time messages. Hm, good point. > > > +long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu) > > > +{ > > > + long int err, r; > > > + struct kvm_nested_guest *l2; > > > + struct pt_regs l2_regs, saved_l1_regs; > > > + struct hv_guest_state l2_hv, saved_l1_hv; > > > + struct kvmppc_vcore *vc = vcpu->arch.vcore; > > > + u64 hv_ptr, regs_ptr; > > > + u64 hdec_exp; > > > + s64 delta_purr, delta_spurr, delta_ic, delta_vtb; > > > + u64 mask; > > > + > > > + if (!kvm_is_radix(vcpu->kvm)) > > > + return H_FUNCTION; > > > > Would it be safer / cleaner to have this instead check that the L1 has > > completed an H_SET_PARTITION_TABLE? Which wouldn't be allowed for an > > HPT guest. > > There is no valid bit in the PTCR value, and the PTCR could have been > set via the one-reg interface without there having been a > H_SET_PARTITION_TABLE in the lifetime of this KVM instance (for > example when the L1 guest has been migrated in from another machine), > so I don't see a foolproof way to do what you suggest. Ah, right. Would checking if the vPTCR is != it's initial value (0?) do the job? (AFAICT PTCR==0 would be unusable, even if technically allowed). -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature