Re: [PATCH v2 15/15] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 7 Jun 2024 11:31:14 +0200

> -int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> -                        int *root_level)
> +static int __kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
> +                                 enum kvm_tdp_mmu_root_types root_type)
>  {
> -       struct kvm_mmu_page *root = root_to_sp(vcpu->arch.mmu->root.hpa);
> +       struct kvm_mmu_page *root = tdp_mmu_get_root(vcpu, root_type);

I think this function should take the struct kvm_mmu_page * directly.

> +{
> +       *root_level = vcpu->arch.mmu->root_role.level;
> +
> +       return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, KVM_DIRECT_ROOTS);

Here you pass root_to_sp(vcpu->arch.mmu->root.hpa);

> +int kvm_tdp_mmu_get_walk_mirror_pfn(struct kvm_vcpu *vcpu, u64 gpa,
> +                                    kvm_pfn_t *pfn)
> +{
> +       u64 sptes[PT64_ROOT_MAX_LEVEL + 1], spte;
> +       int leaf;
> +
> +       lockdep_assert_held(&vcpu->kvm->mmu_lock);
> +
> +       rcu_read_lock();
> +       leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, KVM_MIRROR_ROOTS);

and likewise here.

You might also consider using a kvm_mmu_root_info for the mirror root,
even though the pgd field is not used.

Then __kvm_tdp_mmu_get_walk() can take a struct kvm_mmu_root_info * instead.

kvm_tdp_mmu_get_walk_mirror_pfn() doesn't belong in this series, but
introducing __kvm_tdp_mmu_get_walk() can stay here.

Looking at the later patch, which uses
kvm_tdp_mmu_get_walk_mirror_pfn(), I think this function is a bit
overkill. I'll do a pre-review of the init_mem_region function,
especially the usage of kvm_gmem_populate:

+    slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+    if (!kvm_slot_can_be_private(slot) || !kvm_mem_is_private(kvm, gfn)) {
+        ret = -EFAULT;
+        goto out_put_page;
+    }

The slots_lock is taken, so checking kvm_slot_can_be_private is unnecessary.

Checking kvm_mem_is_private perhaps should also be done in
kvm_gmem_populate() itself. I'll send a patch.

+    read_lock(&kvm->mmu_lock);
+
+    ret = kvm_tdp_mmu_get_walk_mirror_pfn(vcpu, gpa, &mmu_pfn);
+    if (ret < 0)
+        goto out;
+    if (ret > PG_LEVEL_4K) {
+        ret = -EINVAL;
+        goto out;
+    }
+    if (mmu_pfn != pfn) {
+        ret = -EAGAIN;
+        goto out;
+    }

If you require pre-faulting, you don't need to return mmu_pfn and
things would be seriously wrong if the two didn't match, wouldn't
they? You are executing with the filemap_invalidate_lock() taken, and
therefore cannot race with kvm_gmem_punch_hole(). (Soapbox mode on:
this is the advantage of putting the locking/looping logic in common
code, kvm_gmem_populate() in this case).

Therefore, I think kvm_tdp_mmu_get_walk_mirror_pfn() can be replaced just with

int kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
{
  struct kvm *kvm = vcpu->kvm
  bool is_direct = !kvm_has_mirrored_tdp(...) || (gpa & kvm->arch.direct_mask);
  hpa_t root = is_direct ? ... : ...;

  lockdep_assert_held(&vcpu->kvm->mmu_lock);
  rcu_read_lock();
  leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root));
  rcu_read_unlock();
  if (leaf < 0)
    return false;

  spte = sptes[leaf];
  return is_shadow_present_pte(spte) && is_last_spte(spte, leaf);
}
EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped);

+    while (region.nr_pages) {
+        if (signal_pending(current)) {
+            ret = -EINTR;
+            break;
+        }

Checking signal_pending() should be done in kvm_gmem_populate() -
again, I'll take care of that. The rest of the loop is not necessary -
just call kvm_gmem_populate() with the whole range and enjoy. You get
a nice API that is consistent with the intended KVM_PREFAULT_MEMORY
ioctl, because kvm_gmem_populate() returns the number of pages it has
processed and you can use that to massage and copy back the struct
kvm_tdx_init_mem_region.

+        arg = (struct tdx_gmem_post_populate_arg) {
+            .vcpu = vcpu,
+            .flags = cmd->flags,
+        };
+        gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
+                         u64_to_user_ptr(region.source_addr),
+                         1, tdx_gmem_post_populate, &arg);

Paolo