Re: [PATCH v2 04/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Tue, 14 Mar 2023 11:09:20 +0800

On Fri, Mar 10, 2023 at 04:22:35PM -0800, Sean Christopherson wrote:
> Honor KVM's max allowed page size when determining whether or not a 2MiB
> GTT shadow page can be created for the guest.  Querying KVM's max allowed
> size is somewhat odd as there's no strict requirement that KVM's memslots
> and VFIO's mappings are configured with the same gfn=>hva mapping, but
> the check will be accurate if userspace wants to have a functional guest,
> and at the very least checking KVM's memslots guarantees that the entire
> 2MiB range has been exposed to the guest.
>
hi Sean,
I remember in our last discussion, the conclusion was that
we can safely just use VFIO ABI (which is intel_gvt_dma_map_guest_page()
introduced in patch 7) to check max mapping size. [1][2]

"Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
mapping size unnecessarily smaller."
This is especially true when the guest page is page tracked by kvm internally
for nested VMs, but is not page tracked by kvmgt as ppgtt page table pages.
kvmgt is ok to map those pages as huge when 4k is returned in
kvm_page_track_max_mapping_level() for this reason. 

"I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
and permissions, and that the only requirement for KVM memslots is that GTT page
tables need to be visible in KVM's memslots.  But if that's the ABI, then
intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

In other words, pick either VFIO or KVM.  Checking that X is valid according to
KVM and then mapping X through VFIO is confusing and makes assumptions about how
userspace configures KVM and VFIO.  It works because QEMU always configures KVM
and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
unaware readers because the code is technically flawed.
"

[1] https://lore.kernel.org/all/Y7Y+759IN2DH5h3h@xxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/Y7cLkLUMCy+XLRwm@xxxxxxxxxx/

> Note, KVM may also restrict the mapping size for reasons that aren't
> relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> is write-tracked (KVM's write-tracking only handles writes from vCPUs).
> However, such scenarios are unlikely to occur with a well-behaved guest,
> and at worst will result in sub-optimal performance.

As being confirmed in [3], there's no risk of iTLB multi-hit even for
not-well-behaved guests if page tables for DMA mappings in IOMMU page tables
are in a separated set of tables from EPT/NPT (which is the by default
condition currently).

[3] https://lore.kernel.org/all/Y7%2FFZpizEyIaL+Su@xxxxxxxxxxxxxxxxxxxxxxxxx/

So, I'm fine with exporting this kvm_page_track_max_mapping_level()
interface, but I don't think KVMGT is a user of it. 
> 
> Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
> Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>  arch/x86/include/asm/kvm_page_track.h |  2 ++
>  arch/x86/kvm/mmu/page_track.c         | 18 ++++++++++++++++++
>  drivers/gpu/drm/i915/gvt/gtt.c        | 10 +++++++++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
> index eb186bc57f6a..3f72c7a172fc 100644
> --- a/arch/x86/include/asm/kvm_page_track.h
> +++ b/arch/x86/include/asm/kvm_page_track.h
> @@ -51,6 +51,8 @@ void kvm_page_track_cleanup(struct kvm *kvm);
>  
>  bool kvm_page_track_write_tracking_enabled(struct kvm *kvm);
>  int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot);
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level);
>  
>  void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
>  int kvm_page_track_create_memslot(struct kvm *kvm,
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 0a2ac438d647..e739dcc3375c 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -301,3 +301,21 @@ void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
>  			n->track_flush_slot(kvm, slot, n);
>  	srcu_read_unlock(&head->track_srcu, idx);
>  }
> +
> +enum pg_level kvm_page_track_max_mapping_level(struct kvm *kvm, gfn_t gfn,
> +					       enum pg_level max_level)
> +{
> +	struct kvm_memory_slot *slot;
> +	int idx;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +	slot = gfn_to_memslot(kvm, gfn);
> +	if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
> +		max_level = PG_LEVEL_4K;
> +	else
> +		max_level = kvm_mmu_max_slot_mapping_level(slot, gfn, max_level);
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return max_level;
> +}
> +EXPORT_SYMBOL_GPL(kvm_page_track_max_mapping_level);
> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
> index f30922c55a0c..d59c7ab9d224 100644
> --- a/drivers/gpu/drm/i915/gvt/gtt.c
> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> @@ -1157,14 +1157,22 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
>  	struct intel_gvt_gtt_entry *entry)
>  {
>  	const struct intel_gvt_gtt_pte_ops *ops = vgpu->gvt->gtt.pte_ops;
> +	unsigned long gfn = ops->get_pfn(entry);
>  	kvm_pfn_t pfn;
> +	int max_level;
>  
>  	if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
>  		return 0;
>  
>  	if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status))
>  		return -EINVAL;
> -	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
> +
> +	max_level = kvm_page_track_max_mapping_level(vgpu->vfio_device.kvm,
> +						     gfn, PG_LEVEL_2M);
> +	if (max_level < PG_LEVEL_2M)
> +		return 0;
> +
> +	pfn = gfn_to_pfn(vgpu->vfio_device.kvm, gfn);
>  	if (is_error_noslot_pfn(pfn))
>  		return -EINVAL;
>  
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
>