Re: [PATCH 2/4] KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests

Jim Mattson <jmattson@xxxxxxxxxx> · Wed, 4 Sep 2019 09:44:58 -0700

On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan
<krish.sadhukhan@xxxxxxxxxx> wrote:
>
>
>
> On 09/01/2019 05:33 PM, Jim Mattson wrote:
>
> On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
>
> On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan
> <krish.sadhukhan@xxxxxxxxxx> wrote:
>
> On 08/29/2019 03:26 PM, Jim Mattson wrote:
>
> On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan
> <krish.sadhukhan@xxxxxxxxxx> wrote:
>
> According to section "Checks on Guest Control Registers, Debug Registers, and
> and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
> of nested guests:
>
>      If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
>      field must be 0.
>
> Can't we just let the hardware check guest DR7? This results in
> "VM-entry failure due to invalid guest state," right? And we just
> reflect that to L1?
>
> Just trying to understand the reason why this particular check can be
> deferred to the hardware.
>
> The vmcs02 field has the same value as the vmcs12 field, and the
> physical CPU has the same requirements as the virtual CPU.
>
> Actually, you're right. There is a problem. With the current
> implementation, there's a priority inversion if the vmcs12 contains
> both illegal guest state for which the checks are deferred to
> hardware, and illegal entries in the VM-entry MSR-load area. In this
> case, we will synthesize a "VM-entry failure due to MSR loading"
> rather than a "VM-entry failure due to invalid guest state."
>
> There are so many checks on guest state that it's really compelling to
> defer as many as possible to hardware. However, we need to fix the
> aforesaid priority inversion. Instead of returning early from
> nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we
> could induce a "VM-entry failure due to MSR loading" for the next
> VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If
> hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1,
> and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that
> to L1 (along with the appropriate exit qualification).
>
>
> Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ?

Yes.

> The tricky part is in undoing the successful MSR writes if we reflect
> EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be
> undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with
> those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry
> MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g.
> writes to IA32_TIME_STAMP_COUNTER).
>
>
> Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry,  you are saying that the induced "VM-entry failure due to MSR loading"  will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02  continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ?

Sorry; I don't understand the questions.
>
> There are two other scenarios there:
>
>     1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry
>     2. Guest state is illegal but VM-entry MSR-Load area is fine
>
> In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ?  If so, we do we care about reverting MSR-writes  because the SDM section 26.8 say,
>
>         "Processor state is loaded as would be done on a VM exit (see Section 27.5)"

I'm not sure how the referenced section of the SDM is relevant. Are
you assuming that every MSR in the VM-entry MSR load area also appears
in the VM-exit MSR load area? That certainly isn't the case.

> Alternatively, we could perform validity checks on the entire vmcs12
> VM-entry MSR-load area before writing any of the MSRs. This may be
> easier, but it would certainly be slower. We would have to be wary of
> situations where processing an earlier entry affects the validity of a
> later entry. (If we take this route, then we would also have to
> process the valid prefix of the VM-entry MSR-load area when we reflect
> EXIT_REASON_MSR_LOAD_FAIL to L1.)

Forget this paragraph. Even if all of the checks pass, we still have
to undo all of the MSR-writes in the event of a deferred "VM-entry
failure due to invalid guest state."

> Note that this approach could be extended to permit the deferral of
> some control field checks to hardware as well.
>
>
> Why can't the first approach be used for VM-entry controls as well ?

Sorry; I don't understand this question either.

>  As long as the control
> field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU
> enforces the same constraints as the physical CPU, deferral should be
> fine. We just have to make sure that we induce a "VM-entry failure due
> to invalid guest state" for the next VM-entry of vmcs02 if any
> software checks on guest state fail, rather than immediately
> synthesizing an "VM-entry failure due to invalid guest state" during
> the construction of vmcs02.
>
>
> Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ?

Why do you feel that getting the priority correct is so important for
this one check in particular? I'd be surprised if any hypervisor ever
assembled a VMCS that failed this check.