On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote: > When shadowing a GTT entry with a 2M page, explicitly verify that the > first page pinned by VFIO is a transparent hugepage instead of assuming > that page observed by is_2MB_gtt_possible() is the same page pinned by > vfio_pin_pages(). E.g. if userspace is doing something funky with the > guest's memslots, or if the page is demoted between is_2MB_gtt_possible() > and vfio_pin_pages(). > > This is more of a performance optimization than a bug fix as the check > for contiguous struct pages should guard against incorrect mapping (even > though assuming struct pages are virtually contiguous is wrong). > > The real motivation for explicitly checking for a transparent hugepage > after pinning is that it will reduce the risk of introducing a bug in a > future fix for a page refcount leak (KVMGT doesn't put the reference > acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using > KVM's gfn_to_pfn() altogether. > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++-- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c > index 8ae7039b3683..90997cc385b4 100644 > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c > @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn, > goto err; > } > > - if (npage == 0) > - base_page = cur_page; > + if (npage == 0) { > + /* > + * Bail immediately to avoid unnecessary pinning when > + * trying to shadow a 2M page and the host page isn't > + * a transparent hugepage. > + * > + * TODO: support other type hugepages, e.g. HugeTLB. > + */ > + if (size == I915_GTT_PAGE_SIZE_2M && > + !PageTransHuge(cur_page)) Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary. If a page is not transparent huge, but there are 512 contigous 4K pages, I think it's still good to map them in IOMMU in 2M. See vfio_pin_map_dma() who does similar things. > + ret = -EIO; > + else > + base_page = cur_page; > + } > else if (base_page + npage != cur_page) { > gvt_vgpu_err("The pages are not continuous\n"); > ret = -EINVAL; > + } > + if (ret < 0) { > npage++; > goto err; > } > -- > 2.40.0.rc1.284.g88254d51c5-goog >