On Tue, Jun 13, 2023 at 12:50:52PM -0700, Sean Christopherson wrote: > On Fri, Jun 09, 2023, Dmytro Maluka wrote: > > On 6/9/23 04:07, Chen, Jason CJ wrote: > > > I think with PV design, we can benefit from skip shadowing. For example, a TLB flush > > > could be done in hypervisor directly, while shadowing EPT need emulate it by destroy > > > shadow EPT page table entries then do next shadowing upon ept violation. > > This is a bit misleading. KVM has an effective TLB for nested TDP only for 4KiB > pages; larger shadow pages are never allowed to go out-of-sync, i.e. KVM doesn't > wait until L1 does a TLB flush to update SPTEs. KVM does "unload" roots, e.g. to > emulate INVEPT, but that usually just ends up being an extra slow TLB flush in L0, > because nested TDP SPTEs rarely go unsync in practice. The patterns for hypervisors > managing VM memory don't typically trigger the types of PTE modifications that > result in unsync SPTEs. > > I actually have a (very tiny) patch sitting around somwhere to disable unsync support > when TDP is enabled. There is a very, very thoeretical bug where KVM might fail > to honor when a guest TDP PTE change is architecturally supposed to be visible, > and the simplest fix (by far) is to disable unsync support. Disabling TDP+unsync > is a viable fix because unsync support is almost never used for nested TDP. Legacy > shadow paging on the other hand *significantly* benefits from unsync support, e.g. > when the guest is managing CoW mappings. I haven't gotten around to posting the > patch to disable unsync on TDP purely because the flaw is almost comically theoretical. > > Anyways, the point is that the TLB flushing side of nested TDP isn't all that > interesting. Agree. Thanks to point it out! I was thinking based on comparing to current RFC pkvm on x86 solution. :-( To me, the KVM page table shadowing mechanism (e.g., unsync & sync page) is too heavy & complicated, if we have KPOP solution, IIUC, we may be able to totally remove all shadowing stuff, right? :-) BTW, KPOP may bring questions to support access tracking & page dirty loging which may need extend more PV interfaces. MMIO fault could be another issue if we want to keep optimization based on EPT MISCONFIG for IA platform. > > > Yeah indeed, good point. > > > > Is my understanding correct: TLB flush is still gonna be requested by > > the host VM via a hypercall, but the benefit is that the hypervisor > > merely needs to do INVEPT? > > Maybe? A paravirt paging scheme could do whatever it wanted. The APIs could be > designed in such a way that L1 never needs to explicitly request a TLB flush, > e.g. if the contract is that changes must always become immediately visible to L2. > > And TLB flushing is but one small aspect of page table shadowing. With PV paging, > L1 wouldn't need to manage hardware-defined page tables, i.e. could use any arbitrary > data type. E.g. KVM as L1 could use an XArray to track L2 mappings. And L0 in > turn wouldn't need to have vendor specific code, i.e. pKVM on x86 (potentially > *all* architectures) could have a single nested paging scheme for both Intel and > AMD, as opposed to needing code to deal with the differences between EPT and NPT. > > A few months back, I mentally worked through the flows[*] (I forget why I was > thinking about PV paging), and I'm pretty sure that adapting x86's TDP MMU to > support PV paging would be easy-ish, e.g. kvm_tdp_mmu_map() would become an > XArray insertion (to track the L2 mapping) + hypercall (to inform L1 of the new > mapping). > > [*] I even though of a catchy name, KVM Paravirt Only Paging, a.k.a. KPOP ;-) -- Thanks Jason CJ Chen