On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote: > On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote: > > On Wed, 25 Aug 2021 19:14:59 +0100, > > Oliver Upton <oupton@xxxxxxxxxx> wrote: > > > > > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@xxxxxxxxxx> wrote: > > > > [...] > > > > > > Thanks for including me Marc. I think you've mentioned all the examples > > > > of why we don't generally expect N+1 -> N migrations to work that I > > > > can think of. While some of the examples like get-reg-list could > > > > eventually be eliminated if we had CPU models to tighten our machine type > > > > state, I think N+1 -> N migrations will always be best effort at most. > > > > > > > > I agree with giving userspace control over the exposer of the hypercalls > > > > though. Using pseudo-registers for that purpose rather than a pile of > > > > CAPs also seems reasonable to me. > > > > > > > > And, while I don't think this patch is going to proceed, I thought I'd > > > > point out that the opt-out approach doesn't help much with expanding > > > > our migration support unless we require the VMM to be upgraded first. > > > > > > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't > > > > work as expected, since the source enforce opt-out, but the destination > > > > won't. > > > > > > Right, there's going to need to be a fence in both kernel and VMM > > > versions. Before the fence, you can't rollback with either component. > > > Once on the other side of the fence, the user may freely migrate > > > between kernel + VMM combinations. > > > > > > > Also, since the VMM doesn't key off the kernel version, for the > > > > most part N+1 VMMs won't know when they're supposed to opt-out or not, > > > > leaving it to the user to ensure they consider everything. opt-in > > > > usually only needs the user to consider what machine type they want to > > > > launch. > > > > > > Going the register route will implicitly require opt-out for all old > > > hypercalls. We exposed them unconditionally to the guest before, and > > > we must uphold that behavior. The default value for the bitmap will > > > have those features set. Any hypercalls added after that register > > > interface will then require explicit opt-in from userspace. > > > > I disagree here. This makes the ABI inconsistent, and means that no > > feature can be implemented without changing userspace. If you can deal > > with the existing features, you should be able to deal with the next > > lot. > > > > > With regards to the pseudoregister interface, how would a VMM discover > > > new bits? From my perspective, you need to have two bitmaps that the > > > VMM can get at: the set of supported feature bits and the active > > > bitmap of features for a running guest. > > > > My proposal is that we have a single pseudo-register exposing the list > > of implemented by the kernel. Clear the bits you don't want, and write > > back the result. As long as you haven't written anything, you have the > > full feature set. That's pretty similar to the virtio feature > > negotiation. > > Ah, yes I agree. Thinking about it more we will not need something > similar to KVM_GET_SUPPORTED_CPUID. > > So then, for any register where userspace/KVM need to negotiate > features, the default value will return the maximum feature set that is > supported. If userspace wants to constrain features, read out the > register, make sure everything you want is there, and write it back > blowing away the superfluous bits. Given this should we enforce ordering > on feature registers, such that a VMM can only write to the registers > before a VM is started? That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so we can model the feature register control off that. > > Also, Reiji is working on making the identity registers writable for the > sake of feature restriction. The suggested negotiation interface would > be applicable there too, IMO. This this interesting news. I'll look forward to the posting. > > Many thanks to both you and Drew for working this out with me. > Thanks, drew