Re: [PATCH v3 08/10] KVM: x86/mmu: Split out TDP MMU page fault handling

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 11 Oct 2022 21:56:36 +0000

On Wed, Sep 21, 2022, David Matlack wrote:
> Split out the page fault handling for the TDP MMU to a separate
> function.  This creates some duplicate code, but makes the TDP MMU fault
> handler simpler to read by eliminating branches and will enable future
> cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.

IMO, the code duplication isn't worth the relatively minor simplification.  And
some of the "cleanups" to reduce the duplication arguably add complexity, e.g.
moving code around in "Initialize fault.{gfn,slot} earlier for direct MMUs" isn't
a net positive IMO.

With the TDP MMU being all-or-nothing, the need to check if the MMU is a TDP MMU
goes away.  If the TDP MMU is enabled, direct_page_fault() will be reached only
for non-nested TDP page faults.  Paging is required for NPT, and EPT faults will
always go through ept_page_fault(), i.e. nonpaging_page_fault() is unused.

In other words, this code

	bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu);

gets replaced with direct checks on is_tdp_mmu_enabled().  And because fast page
faults are used only for direct MMUs,  that code can also simply query whether or
not the TDP MMU is enabled.

Avoiding make_mmu_pages_available() for the TDP MMU is easy enough and doesn't
require any new conditionals.

	if (is_tdp_mmu_enabled()) {
		r = kvm_tdp_mmu_map(vcpu, fault);
	} else {
		r = make_mmu_pages_available(vcpu);
		if (r)
			goto out_unlock;

		r = __direct_map(vcpu, fault);
	}

That leaves the read vs. write lock as the only code that is borderline difficult
to read.  I agree that's a little ugly, but IMO it's not worth duplicating the
entire function.  If we really wanted to hide that ugliness, we could add helpers
to do the lock+unlock, but even that seems somewhat superfluous.

>From a performance perspective, with the flag being read-only after the vendor
module is loaded, we can add a static key to track TDP MMU enabling, biased to the
TDP MMU being enabled.  That turns all of if-statements into NOPs for the TDP MMU.
(this is why I suggested keeping the is_tdp_mmu_enabled() wrapper in patch 1).
Static branches actually shrink the code size too (because that matters), e.g. a
5-byte nop instead of 8 bytes for TEST+Jcc.

I'll speculatively post a v4 with the static key idea, minus patches 7, 8, and 10.
I think that'll make it easier to discuss the static key/branch implementation.
If we decide we still want to split out the TDP MMU page fault handler, then it
should be relatively simple to fold back in those patches.

E.g. the generate code looks like this:

4282		if (is_tdp_mmu_enabled())
4283			read_lock(&vcpu->kvm->mmu_lock);
   0x000000000005ff7e <+494>:   nopl   0x0(%rax,%rax,1)
   0x000000000005ff83 <+499>:	mov    (%rbx),%rdi
   0x000000000005ff86 <+502>:	call   0x5ff8b <direct_page_fault+507>

4286	
4287		if (is_page_fault_stale(vcpu, fault))
   0x000000000005ff8b <+507>:	mov    %r15,%rsi
   0x000000000005ff8e <+510>:	mov    %rbx,%rdi
   0x000000000005ff91 <+513>:	call   0x558b0 <is_page_fault_stale>
   0x000000000005ff96 <+518>:	test   %al,%al
   0x000000000005ff98 <+520>:	jne    0x603e1 <direct_page_fault+1617>

4288			goto out_unlock;
4289	
4290		if (is_tdp_mmu_enabled()) {
4291			r = kvm_tdp_mmu_map(vcpu, fault);
   0x000000000005ff9e <+526>:   nopl   0x0(%rax,%rax,1)
   0x000000000005ffa3 <+531>:	mov    %rbx,%rdi
   0x000000000005ffa6 <+534>:	mov    %r15,%rsi
   0x000000000005ffa9 <+537>:	call   0x5ffae <direct_page_fault+542>
   0x000000000005ffae <+542>:	mov    (%rbx),%rdi
   0x000000000005ffb1 <+545>:	mov    %eax,%ebp

#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_TRUE(tdp_mmu_enabled);
#endif

static inline bool is_tdp_mmu_enabled(void)
{
#ifdef CONFIG_X86_64
        return static_branch_likely(&tdp_mmu_enabled);
#else
        return false;
#endif
}

static inline bool is_tdp_mmu(struct kvm_mmu *mmu)
{
        return is_tdp_mmu_enabled() && mmu->root_role.direct;
}