On Wed, Aug 18, 2021 at 4:11 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Aug 18, 2021, Robert Hoo wrote: > > > Limiting this to VMREAD/VMWRITE means we shouldn't need a bitmap and > > > can use a more static lookup, e.g. a switch statement. > > Emm, hard for me to choose: > > > > Your approach sounds more efficient for CPU: Once VMX MSR's updated, no > > bother to update the bitmap. Each field's existence check will directly > > consult related VMX MSR. Well, the switch statement will be long... > > How long? Honest question, off the top of my head I don't have a feel for how > many fields conditionally exist. > > > My this implementation: once VMX MSR's updated, the update needs to be > > passed to bitmap, this is 1 extra step comparing to aforementioned > > above. But, later, when query field existence, especially the those > > consulting vm{entry,exit}_ctrl, they usually would have to consult both > > MSRs if otherwise no bitmap, and we cannot guarantee if in the future > > there's no more complicated dependencies. If using bitmap, this consult > > is just 1-bit reading. If no bitmap, several MSR's read and compare > > happen. > > Yes, but the bitmap is per-VM and likely may or may not be cache-hot for back-to-back > VMREAD/VMWRITE to different fields, whereas the shadow controls are much more likely > to reside somewhere in the caches. > > > And, VMX MSR --> bitmap, usually happens only once when vCPU model is > > settled. But VMRead/VMWrite might happen frequently, depends on guest > > itself. I'd rather leave complicated comparison in former than in > > later. > > I'm not terribly concerned about the runtime performance, it's the extra per-VM > allocation for something that's not thaaat interesting that I don't like. > > And for performance, most of the frequently accessed VMCS fields will be shadowed > anyways, i.e. won't VM-Exit in the first place. > > And that brings up another wrinkle. The shadow VMCS bitmaps are global across > all VMs, e.g. if the preemption timer is supported in hardware but hidden from > L1, then a misbehaving L1 can VMREAD/VMWRITE the field even with this patch. > If it was just the preemption timer we could consider disabling shadow VMCS for > the VM ifthe timer exists but is hidden from L1, but GUEST_PML_INDEX and > GUEST_INTR_STATUS are also conditional :-( > > Maybe there's a middle ground, e.g. let userspace tell KVM which fields it plans > on exposing to L1, use that to build the bitmaps, and disable shadow VMCS if > userspace creates VMs that don't match the specified configuration. Burning > three more pages per VM isn't very enticing... > > This is quite the complicated mess for something I'm guessing no one actually > cares about. At what point do we chalk this up as a virtualization hole and > sweep it under the rug? Good point! Note that hardware doesn't even get this right. See erratum CF77 in https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e7-v2-spec-update.pdf. I'd cut and paste the text here, but Intel won't allow that.