Re: [PATCH] KVM: x86/mmu: optimizing the code in mmu_try_to_unsync_pages

Yun Lu <luyun_611@xxxxxxx> · Mon, 23 May 2022 17:27:39 +0800

On 2022/5/20 下午10:47, Sean Christopherson wrote:

On Fri, May 20, 2022, Yuan Yao wrote:
On Fri, May 20, 2022 at 02:09:07PM +0800, Yun Lu wrote:
There is no need to check can_unsync and prefetch in the loop
every time, just move this check before the loop.

Signed-off-by: Yun Lu <luyun@xxxxxxxxxx>
---
  arch/x86/kvm/mmu/mmu.c | 12 ++++++------
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 311e4e1d7870..e51e7735adca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2534,6 +2534,12 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
  	if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
  		return -EPERM;

+	if (!can_unsync)
+		return -EPERM;
+
+	if (prefetch)
+		return -EEXIST;
+
  	/*
  	 * The page is not write-tracked, mark existing shadow pages unsync
  	 * unless KVM is synchronizing an unsync SP (can_unsync = false).  In
@@ -2541,15 +2547,9 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
  	 * allowing shadow pages to become unsync (writable by the guest).
  	 */
  	for_each_gfn_indirect_valid_sp(kvm, sp, gfn) {
-		if (!can_unsync)
-			return -EPERM;
-
  		if (sp->unsync)
  			continue;

-		if (prefetch)
-			return -EEXIST;
-
Consider the case that for_each_gfn_indirect_valid_sp() loop is
not triggered, means the gfn is not MMU page table page:

The old behavior when : return 0;
The new behavior with this change: returrn -EPERM / -EEXIST;

It at least breaks FNAME(sync_page) -> make_spte(prefetch = true, can_unsync = false)
which removes PT_WRITABLE_MASK from last level mapping unexpectedly.
Yep, the flags should be queried if and only if there's at least one valid, indirect
SP for th gfn.  And querying whether there's such a SP is quite expesnive and requires
looping over a list, so checking every iteration of the loop is far cheaper.  E.g. each
check is a single uop on modern CPUs as both gcc and clang are smart enough to stash
the flags in registers so that there's no reload from memory on each loop.  And that
also means the CPU can more than likely correctly predict subsequent iterations.
OK, it's my mistake.  Thanks for your answers.