Re: About two-dimensional page translation (e.g., Intel EPT) and shadow page table in Linux QEMU/KVM

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 28 Jul 2021 20:01:18 +0000

On Wed, Jul 28, 2021, harry harry wrote:
> Sean, sorry for the late reply. Thanks for your careful explanations.
> 
> > For emulation of any instruction/flow that starts with a guest virtual address.
> > On Intel CPUs, that includes quite literally any "full" instruction emulation,
> > since KVM needs to translate CS:RIP to a guest physical address in order to fetch
> > the guest's code stream.  KVM can't avoid "full" emulation unless the guest is
> > heavily enlightened, e.g. to avoid string I/O, among many other things.
> 
> Do you mean the emulated MMU is needed when it *only* wants to
> translate GVAs to GPAs in the guest level?

Not quite, though gva_to_gpa() is the main use.  The emulated MMU is also used to
inject guest #PF and to load/store guest PDTPRs.  

> In such cases, the hardware MMU cannot be used because hardware MMU
> can only translate GVAs to HPAs, right?

Sort of.  The hardware MMU does translate GVA to GPA, but the GPA value is not
visible to software (unless the GPA->HPA translation faults).  That's also true
for VA to PA (and GVA to HPA).  Irrespective of virtualization, x86 ISA doesn't
provide an instruction to retrive the PA for a given VA.

If such an instruction did exist, and it was to be usable for a VMM to do a
GVA->GPA translation, the magic instruction would need to take all MMU params as
operands, e.g. CR0, CR3, CR4, and EFER.  When KVM is active (not the guest), the
hardware MMU is loaded with the host MMU configuration, not the guest.  In both
VMX and SVM, vCPU state is mostly ephemeral in the sense that it ceases to exist
in hardware when the vCPU exits to the host.  Some state is retained in hardware,
e.g. TLB and cache entries, but those are associated with select properties of
the vCPU, e.g. EPTP, CR3, etc..., not with the vCPU itself, i.e. not with the
VMCS (VMX) / VMCB (SVM).