Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list

Alexander Graf <graf@xxxxxxxxxx> · Sat, 8 Aug 2020 00:49:17 +0200

On 07.08.20 23:16, Jim Mattson wrote:

On Fri, Aug 7, 2020 at 9:12 AM Alexander Graf <graf@xxxxxxxxxx> wrote:



On 04.08.20 06:20, Aaron Lewis wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



Add KVM_SET_EXIT_MSRS ioctl to allow userspace to pass in a list of MSRs
that force an exit to userspace when rdmsr or wrmsr are used by the
guest.

KVM_SET_EXIT_MSRS will need to be called before any vCPUs are
created to protect the 'user_exit_msrs' list from being mutated while
vCPUs are running.

Add KVM_CAP_SET_MSR_EXITS to identify the feature exists.

Signed-off-by: Aaron Lewis <aaronlewis@xxxxxxxxxx>
Reviewed-by: Oliver Upton <oupton@xxxxxxxxxx>
---
   Documentation/virt/kvm/api.rst  | 24 +++++++++++++++++++
   arch/x86/include/asm/kvm_host.h |  2 ++
   arch/x86/kvm/x86.c              | 41 +++++++++++++++++++++++++++++++++
   include/uapi/linux/kvm.h        |  2 ++
   4 files changed, 69 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 320788f81a05..7d8167c165aa 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1006,6 +1006,30 @@ such as migration.
   :Parameters: struct kvm_vcpu_event (out)
   :Returns: 0 on success, -1 on error

+4.32 KVM_SET_EXIT_MSRS
+------------------
+
+:Capability: KVM_CAP_SET_MSR_EXITS
+:Architectures: x86
+:Type: vm ioctl
+:Parameters: struct kvm_msr_list (in)
+:Returns: 0 on success, -1 on error
+
+Sets the userspace MSR list which is used to track which MSRs KVM should send
+to userspace to be serviced when the guest executes rdmsr or wrmsr.

Unfortunately this doesn't solve the whole "ignore_msrs" mess that we
have today. How can I just say "tell user space about unhandled MSRs"?
And if I add that on top of this mechanism, how do we not make the list
of MSRs that are handled in-kernel an ABI?

Jumping in for Aaron, who is out this afternoon...

Awesome, thanks for the super quick reply!


This patch doesn't attempt to solve your problem, "tell user space
about unhandled MSRs." It attempts to solve our problem, "tell user
space about a specific set of MSRs, even if kvm learns to handle them
in the future." This is, in fact, what we really wanted to do when
Peter Hornyack implemented the "tell user space about unhandled MSRs"
changes in 2015. We just didn't realize it at the time.

Ok, let's take a step back then. What exactly are you trying to solve 
and which MSRs do you care about?

My main problem with a deny list approach is that it's (practically) 
impossible to map the allow list use case onto it. It is however trivial 
to only deny a few select MSRs explicitly with an allow list approach. I 
don't want to introduce yet another ABI to KVM in 2 years to then have 
both :).

Though your proposal does partially solve our problem, it's a lot
easier to specify a small set of MSRs by inclusion than it is by
exclusion. (Where your proposal falls short of our needs is when
userspace wants to handle an MSR that kvm would not typically
intercept at all.)

The only MSRs that KVM does not intercept are the ones explicitly 
specified as pass-through


+
+This ioctl needs to be called before vCPUs are setup otherwise the list of MSRs
+will not be accepted and an EINVAL error will be returned.  Also, if a list of
+MSRs has already been supplied, and this ioctl is called again an EEXIST error
+will be returned.
+
+::
+
+  struct kvm_msr_list {
+  __u32 nmsrs;
+  __u32 indices[0];
+};
+
   X86:
   ^^^^

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index be5363b21540..510055471dd0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1004,6 +1004,8 @@ struct kvm_arch {

          struct kvm_pmu_event_filter *pmu_event_filter;
          struct task_struct *nx_lpage_recovery_thread;
+
+       struct kvm_msr_list *user_exit_msrs;
   };

   struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88c593f83b28..46a0fb9e0869 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3419,6 +3419,42 @@ static inline bool kvm_can_mwait_in_guest(void)
                  boot_cpu_has(X86_FEATURE_ARAT);
   }

+static int kvm_vm_ioctl_set_exit_msrs(struct kvm *kvm,
+                                     struct kvm_msr_list __user *user_msr_list)
+{
+       struct kvm_msr_list *msr_list, hdr;
+       size_t indices_size;
+
+       if (kvm->arch.user_exit_msrs != NULL)
+               return -EEXIST;
+
+       if (kvm->created_vcpus)
+               return -EINVAL;

What if we need to change the set of user space handled MSRs
dynamically, for example because a feature has been enabled through a
previous MSR?

It's never been an issue for us, and this approach avoids the
messiness of having to back out the old changes to the MSR permissions
bitmaps, which is fraught. It can be done, but I would question the
ROI on the additional complexity. In any case, I think a VCPU ioctl
would be more appropriate than a VM ioctl if dynamic modifications
were allowed. For instance, in your example, the previous MSR write
that enabled a feature requiring a new user space MSR exit would
likely apply only to the VCPU on which that previous MSR write
happened.

My general rule of thumb with PPC has always been to make as much vCPU 
specific as possible. X86 KVM is a bit different there - and I guess it 
does help for memory consumption :).


+
+       if (copy_from_user(&hdr, user_msr_list, sizeof(hdr)))
+               return -EFAULT;
+
+       if (hdr.nmsrs >= MAX_IO_MSRS)

This constant is not defined inside the scope of this patch. Is your
patchset fully bisectable? Please make sure that every patch builds
successfully on its own :).

I can't vouch for this patchset being fully bisectable, but this
particular constant was moved to x86.c ~13 years ago, in commit
313a3dc75da20 ("KVM: Portability: split kvm_vcpu_ioctl"). It should
not cause any build issues. :)

Ugh, sorry, my bad :).


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879