Re: [PATCH 1/6] KVM: x86: Add ioctl for accepting a userspace provided MSR list

Jim Mattson <jmattson@xxxxxxxxxx> · Fri, 7 Aug 2020 17:39:27 -0700

On Fri, Aug 7, 2020 at 3:49 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
>
> On 07.08.20 23:16, Jim Mattson wrote:
> > This patch doesn't attempt to solve your problem, "tell user space
> > about unhandled MSRs." It attempts to solve our problem, "tell user
> > space about a specific set of MSRs, even if kvm learns to handle them
> > in the future." This is, in fact, what we really wanted to do when
> > Peter Hornyack implemented the "tell user space about unhandled MSRs"
> > changes in 2015. We just didn't realize it at the time.
>
> Ok, let's take a step back then. What exactly are you trying to solve
> and which MSRs do you care about?

Our userspace handles a handful of MSRs, including some of the
synthetic PV time-related MSRs (both kvm and Hyper-V), which we want
to exit to userspace even though kvm thinks it knows what to do with
them. We also have a range of Google-specific MSRs that kvm will
probably never know about. And, finally, we find that as time goes on,
there are situations where we want to add support for a new MSR in
userspace because we can roll out a new userspace more quickly than a
new kernel. An example of this last category is
MSR_IA32_ARCH_CAPABILITIES. We don't necessarily want to cede control
of such MSRs to kvm as soon as it knows about them. For instance, we
may want to wait until the new kernel reaches its rollback horizon.
There may be a few more MSRs we handle in userspace, but I think that
covers the bulk of them.

I don't believe we have yet seen a case where we wanted to take
control of an MSR that kvm didn't intercept itself. That part of
Aaron's patch may be overkill, but it gives the API clean semantics.
It seems a bit awkward to have a userspace MSR list where accesses to
some of the MSRs exit to userspace and accesses to others do not. (I
believe that your implementation of the allow list suffers from this
awkwardness, though I haven't yet had time to review it in depth.)

> My main problem with a deny list approach is that it's (practically)
> impossible to map the allow list use case onto it. It is however trivial
> to only deny a few select MSRs explicitly with an allow list approach. I
> don't want to introduce yet another ABI to KVM in 2 years to then have
> both :).

I agree in part. A deny list does not support your use case well. But
does an allow list support it that much better? I suspect that the
allow list that specifies "every MSR that kvm knows about today" is
far from trivial to construct, especially if you consider different
microarchitectures and different VM configurations (e.g. vPMU vs. no
vPMU).

Can we back up and ask what it is that you want to accomplish? If you
simply want ignore_msrs to be per-VM, wouldn't it be easier to add an
ioctl to simply set a boolean that potentially overrides the module
parameter, and then modify the ignore_msrs code to consult the per-VM
boolean?

Getting back to one of the issues I raised earlier: your "exit instead
of #GP" patch exits to userspace on both unknown MSRs and illegal
WRMSR values. If userspace wants to implement "ignore MSRs" for a
particular VM, it's absolutely trivial when the reason for the exit is
"unknown MSR." However, it becomes more complicated when the reason
for the exit is "#GP." Now, userspace has to know which MSRs are known
to kvm before it can decide whether or not the access should #GP.

I hope this doesn't come across sounding antagonistic. Like you, I
want to avoid revisiting this design a few years down the road. We're
already in that boat with the 2015 "exit on unknown MSR" change!