Re: [PATCH v2 1/3] KVM: x86: Deflect unknown MSR accesses to user space

Jim Mattson <jmattson@xxxxxxxxxx> · Thu, 30 Jul 2020 16:53:05 -0700

On Thu, Jul 30, 2020 at 4:08 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
>
>
>
> On 31.07.20 00:42, Jim Mattson wrote:
> >
> > On Wed, Jul 29, 2020 at 4:59 PM Alexander Graf <graf@xxxxxxxxxx> wrote:
> >>
> >> MSRs are weird. Some of them are normal control registers, such as EFER.
> >> Some however are registers that really are model specific, not very
> >> interesting to virtualization workloads, and not performance critical.
> >> Others again are really just windows into package configuration.
> >>
> >> Out of these MSRs, only the first category is necessary to implement in
> >> kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
> >> certain CPU models and MSRs that contain information on the package level
> >> are much better suited for user space to process. However, over time we have
> >> accumulated a lot of MSRs that are not the first category, but still handled
> >> by in-kernel KVM code.
> >>
> >> This patch adds a generic interface to handle WRMSR and RDMSR from user
> >> space. With this, any future MSR that is part of the latter categories can
> >> be handled in user space.
> >>
> >> Furthermore, it allows us to replace the existing "ignore_msrs" logic with
> >> something that applies per-VM rather than on the full system. That way you
> >> can run productive VMs in parallel to experimental ones where you don't care
> >> about proper MSR handling.
> >>
> >> Signed-off-by: Alexander Graf <graf@xxxxxxxxxx>
> >
> > Can we just drop em_wrmsr and em_rdmsr? The in-kernel emulator is
> > already incomplete, and I don't think there is ever a good reason for
> > kvm to emulate RDMSR or WRMSR if the VM-exit was for some other reason
> > (and we shouldn't end up here if the VM-exit was for RDMSR or WRMSR).
> > Am I missing something?
>
> On certain combinations of CPUs and guest modes, such as real mode on
> pre-Nehalem(?) at least, we are running all guest code through the
> emulator and thus may encounter a RDMSR or WRMSR instruction. I *think*
> we also do so for big real mode on more modern CPUs, but I'm not 100% sure.

Oh, gag me with a spoon! (BTW, we shouldn't have to emulate big real
mode if the CPU supports unrestricted guest mode. If we do, something
is probably wrong.)

> > You seem to be assuming that the instruction at CS:IP will still be
> > RDMSR (or WRMSR) after returning from userspace, and we will come
> > through kvm_{get,set}_msr_user_space again at the next KVM_RUN. That
> > isn't necessarily the case, for a variety of reasons. I think the
>
> Do you have a particular situation in mind where that would not be the
> case and where we would still want to actually complete an MSR operation
> after the environment changed?

As far as userspace is concerned, if it has replied with error=0, the
instruction has completed and retired. If the kernel executes a
different instruction at CS:RIP, the state is certainly inconsistent
for WRMSR exits. It would also be inconsistent for RDMSR exits if the
RDMSR emulation on the userspace side had any side-effects.

> > 'completion' of the userspace instruction emulation should be done
> > with the complete_userspace_io [sic] mechanism instead.
>
> Hm, that would avoid a roundtrip into guest mode, but add a cycle
> through the in-kernel emulator. I'm not sure that's a net win quite yet.
>
> >
> > I'd really like to see this mechanism apply only in the case of
> > invalid/unknown MSRs, and not for illegal reads/writes as well.
>
> Why? Any #GP inducing MSR access will be on the slow path. What's the
> problem if you get a few more of them in user space that you just bounce
> back as failing, so they actually do inject a fault?

I'm not concerned about the performance. I think I'm just biased
because of what we have today. But since we're planning on dropping
that anyway, I take it back. IIRC, the plumbing to make the
distinction is a little painful, and I don't want to ask you to go
there.