Hello Steve, On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote: > On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote: > > > > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote: > > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote: > > > > > > > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote: > > > > > [+Marc] > > > > > > > > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote: > > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote: > > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote: > > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote: > > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote: > > > > > > > > > Thanks for grabbing the data! > > > > > > > > > > > > > > > > > > I am fine with both paths. Sean has stated an explicit desire for > > > > > > > > > hypercall exiting, so I think that would be the current consensus. > > > > > > > > > > > > > > Yep, though it'd be good to get Paolo's input, too. > > > > > > > > > > > > > > > > If we want to do hypercall exiting, this should be in a follow-up > > > > > > > > > series where we implement something more generic, e.g. a hypercall > > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall > > > > > > > > > exit route, we can drop the kvm side of the hypercall. > > > > > > > > > > > > > > I don't think this is a good candidate for arbitrary hypercall interception. Or > > > > > > > rather, I think hypercall interception should be an orthogonal implementation. > > > > > > > > > > > > > > The guest, including guest firmware, needs to be aware that the hypercall is > > > > > > > supported, and the ABI needs to be well-defined. Relying on userspace VMMs to > > > > > > > implement a common ABI is an unnecessary risk. > > > > > > > > > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but > > > > > > > require further VMM intervention. But, I just don't see the point, it would > > > > > > > save only a few lines of code. It would also limit what KVM could do in the > > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace, > > > > > > > then mandatory interception would essentially make it impossible for KVM to do > > > > > > > bookkeeping while still honoring the interception request. > > > > > > > > > > > > > > However, I do think it would make sense to have the userspace exit be a generic > > > > > > > exit type. But hey, we already have the necessary ABI defined for that! It's > > > > > > > just not used anywhere. > > > > > > > > > > > > > > /* KVM_EXIT_HYPERCALL */ > > > > > > > struct { > > > > > > > __u64 nr; > > > > > > > __u64 args[6]; > > > > > > > __u64 ret; > > > > > > > __u32 longmode; > > > > > > > __u32 pad; > > > > > > > } hypercall; > > > > > > > > > > > > > > > > > > > > > > > Userspace could also handle the MSR using MSR filters (would need to > > > > > > > > > confirm that). Then userspace could also be in control of the cpuid bit. > > > > > > > > > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data. > > > > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but > > > > > > > that will result in truly heinous guest code, and extensibility issues. > > > > > > > > > > > We may also need to pass-through the MSR to userspace, as it is a part of this > > complete host (userspace/kernel), OVMF and guest kernel negotiation of > > the SEV live migration feature. > > > > Host (userspace/kernel) advertises it's support for SEV live migration > > feature via the CPUID bits, which is queried by OVMF and which in turn > > adds a new UEFI runtime variable to indicate support for SEV live > > migration, which is later queried during guest kernel boot and > > accordingly the guest does a wrmrsl() to custom MSR to complete SEV > > live migration negotiation and enable it. > > > > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write > > enables SEV live migration, hence, preventing userspace to start live > > migration before the feature support has been negotiated and enabled on > > all the three components - host, guest OVMF and kernel. > > > > But, now with this ioctl not existing anymore, we will need to > > pass-through the MSR to userspace too, for it to only initiate live > > migration once the feature negotiation has been completed. > > I can't tell if you were waiting for feedback on this before posting > the follow-up patch series. Actually, i am going to post the follow-up patch series upstream early next week. I have already added support for MSR handling and exit to userspace, the current implementation looks like this : The custom MSR is hooked both in svm_get_msr() and svm_set_msr(): @@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_F10H_DECFG: msr_info->data = svm->msr_decfg; break; + case MSR_KVM_SEV_LIVE_MIGRATION: + if (!sev_guest(vcpu->kvm)) + return 1; + + if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION)) + return 1; + + /* + * Let userspace handle the MSR using MSR filters. + */ + return KVM_MSR_RET_FILTERED; So there is special check added for both sev_guest() and requisite CPUID bit set in guest CPUID, if either fails this will signal #GP to guest. Otherwise, it returns MSR_FILTER return code, which will allow userspace to use msr intercepts to handle the reads and writes via userspace exits using KVM_EXIT_X86_RDMSR/KVM_EXIT_X86_WRMSR. Let me know if you have any feedback/comments on the above handling. Thanks, Ashish > > Here are a few options: > 1) Add the MSR explicitly to the list of custom kvm MSRs, but don't > have it hooked up anywhere. The expectation would be for the VMM to > use msr intercepts to handle the reads and writes. If that seems > weird, have svm_set_msr (or whatever) explicitly ignore it. > 2) Add a getter and setter for the MSR. Only allow guests to use it if > they are sev_guests with the requisite CPUID bit set. > > I think I prefer the former, and it should work fine from my > understanding of the msr intercepts implementation. I'm also open to > other ideas. You could also have the MSR write trigger a KVM_EXIT of > the same type as the hypercall, but have it just say "the msr value > changed to XYZ", but that design sounds awkward. > > Thanks, > Steve