On Tue, Sep 3, 2019 at 5:59 PM Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> wrote: > > > > On 09/01/2019 05:33 PM, Jim Mattson wrote: > > On Fri, Aug 30, 2019 at 4:15 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > On Fri, Aug 30, 2019 at 4:07 PM Krish Sadhukhan > <krish.sadhukhan@xxxxxxxxxx> wrote: > > On 08/29/2019 03:26 PM, Jim Mattson wrote: > > On Thu, Aug 29, 2019 at 2:25 PM Krish Sadhukhan > <krish.sadhukhan@xxxxxxxxxx> wrote: > > According to section "Checks on Guest Control Registers, Debug Registers, and > and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry > of nested guests: > > If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7 > field must be 0. > > Can't we just let the hardware check guest DR7? This results in > "VM-entry failure due to invalid guest state," right? And we just > reflect that to L1? > > Just trying to understand the reason why this particular check can be > deferred to the hardware. > > The vmcs02 field has the same value as the vmcs12 field, and the > physical CPU has the same requirements as the virtual CPU. > > Actually, you're right. There is a problem. With the current > implementation, there's a priority inversion if the vmcs12 contains > both illegal guest state for which the checks are deferred to > hardware, and illegal entries in the VM-entry MSR-load area. In this > case, we will synthesize a "VM-entry failure due to MSR loading" > rather than a "VM-entry failure due to invalid guest state." > > There are so many checks on guest state that it's really compelling to > defer as many as possible to hardware. However, we need to fix the > aforesaid priority inversion. Instead of returning early from > nested_vmx_enter_non_root_mode() with EXIT_REASON_MSR_LOAD_FAIL, we > could induce a "VM-entry failure due to MSR loading" for the next > VM-entry of vmcs02 and continue with the attempted vmcs02 VM-entry. If > hardware exits with EXIT_REASON_INVALID_STATE, we reflect that to L1, > and if hardware exits with EXIT_REASON_INVALID_STATE, we reflect that > to L1 (along with the appropriate exit qualification). > > > Looking at nested_vmx_exit_reflected(), it seems we do return to L1 if the error is EXIT_REASON_INVALID_STATE. So if we fix the priority inversion, this should work then ? Yes. > The tricky part is in undoing the successful MSR writes if we reflect > EXIT_REASON_INVALID_STATE to L1. Some MSR writes can't actually be > undone (e.g. writes to IA32_PRED_CMD), but maybe we can get away with > those. (Fortunately, it's illegal to put x2APIC MSRs in the VM-entry > MSR-load area!) Other MSR writes are just a bit tricky to undo (e.g. > writes to IA32_TIME_STAMP_COUNTER). > > > Let's say that the priority inversion issue is fixed. In the scenario in which the Guest state is fine but the VM-entry MSR-Load area contains an illegal entry, you are saying that the induced "VM-entry failure due to MSR loading" will be caught during the next VM-entry of vmcs02. So how far does the attempted VM-entry of vmcs02 continue with an illegal MSR-Load entry and how do get to the next VM-entry of vmcs02 ? Sorry; I don't understand the questions. > > There are two other scenarios there: > > 1. Guest state is illegal and VM-entry MSR-Load area contains an illegal entry > 2. Guest state is illegal but VM-entry MSR-Load area is fine > > In these scenarios, L2 will exit to L1 with EXIT_REASON_INVALID_STATE and finally this will be returned to L1 userspace. Right ? If so, we do we care about reverting MSR-writes because the SDM section 26.8 say, > > "Processor state is loaded as would be done on a VM exit (see Section 27.5)" I'm not sure how the referenced section of the SDM is relevant. Are you assuming that every MSR in the VM-entry MSR load area also appears in the VM-exit MSR load area? That certainly isn't the case. > Alternatively, we could perform validity checks on the entire vmcs12 > VM-entry MSR-load area before writing any of the MSRs. This may be > easier, but it would certainly be slower. We would have to be wary of > situations where processing an earlier entry affects the validity of a > later entry. (If we take this route, then we would also have to > process the valid prefix of the VM-entry MSR-load area when we reflect > EXIT_REASON_MSR_LOAD_FAIL to L1.) Forget this paragraph. Even if all of the checks pass, we still have to undo all of the MSR-writes in the event of a deferred "VM-entry failure due to invalid guest state." > Note that this approach could be extended to permit the deferral of > some control field checks to hardware as well. > > > Why can't the first approach be used for VM-entry controls as well ? Sorry; I don't understand this question either. > As long as the control > field is copied verbatim from vmcs12 to vmcs02 and the virtual CPU > enforces the same constraints as the physical CPU, deferral should be > fine. We just have to make sure that we induce a "VM-entry failure due > to invalid guest state" for the next VM-entry of vmcs02 if any > software checks on guest state fail, rather than immediately > synthesizing an "VM-entry failure due to invalid guest state" during > the construction of vmcs02. > > > Is it OK to keep this Guest check in software for now and then remove it once we have a solution in place ? Why do you feel that getting the priority correct is so important for this one check in particular? I'd be surprised if any hypervisor ever assembled a VMCS that failed this check.