Re: [PATCH v1 3/5] KVM: x86: nVMX: VMCS12 field's read/write respects field existence bitmap

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 18 Aug 2021 23:10:55 +0000

On Wed, Aug 18, 2021, Robert Hoo wrote:
> > Limiting this to VMREAD/VMWRITE means we shouldn't need a bitmap and
> > can use a more static lookup, e.g. a switch statement.  
> Emm, hard for me to choose:
> 
> Your approach sounds more efficient for CPU: Once VMX MSR's updated, no
> bother to update the bitmap. Each field's existence check will directly
> consult related VMX MSR. Well, the switch statement will be long...

How long?  Honest question, off the top of my head I don't have a feel for how
many fields conditionally exist.

> My this implementation: once VMX MSR's updated, the update needs to be
> passed to bitmap, this is 1 extra step comparing to aforementioned
> above. But, later, when query field existence, especially the those
> consulting vm{entry,exit}_ctrl, they usually would have to consult both
> MSRs if otherwise no bitmap, and we cannot guarantee if in the future
> there's no more complicated dependencies. If using bitmap, this consult
> is just 1-bit reading. If no bitmap, several MSR's read and compare
> happen.

Yes, but the bitmap is per-VM and likely may or may not be cache-hot for back-to-back
VMREAD/VMWRITE to different fields, whereas the shadow controls are much more likely
to reside somewhere in the caches.

> And, VMX MSR --> bitmap, usually happens only once when vCPU model is
> settled. But VMRead/VMWrite might happen frequently, depends on guest
> itself. I'd rather leave complicated comparison in former than in
> later.

I'm not terribly concerned about the runtime performance, it's the extra per-VM
allocation for something that's not thaaat interesting that I don't like.

And for performance, most of the frequently accessed VMCS fields will be shadowed
anyways, i.e. won't VM-Exit in the first place.

And that brings up another wrinkle.  The shadow VMCS bitmaps are global across
all VMs, e.g. if the preemption timer is supported in hardware but hidden from
L1, then a misbehaving L1 can VMREAD/VMWRITE the field even with this patch.
If it was just the preemption timer we could consider disabling shadow VMCS for
the VM ifthe timer exists but is hidden from L1, but GUEST_PML_INDEX and
GUEST_INTR_STATUS are also conditional :-(

Maybe there's a middle ground, e.g. let userspace tell KVM which fields it plans
on exposing to L1, use that to build the bitmaps, and disable shadow VMCS if
userspace creates VMs that don't match the specified configuration.  Burning
three more pages per VM isn't very enticing...

This is quite the complicated mess for something I'm guessing no one actually
cares about.  At what point do we chalk this up as a virtualization hole and
sweep it under the rug?