Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Thu, 19 Jan 2023 13:26:58 +0800

On Thu, Jan 19, 2023 at 10:58:42AM +0800, Zhenyu Wang wrote:
> On 2023.01.11 17:55:04 +0000, Sean Christopherson wrote:
> > On Mon, Jan 09, 2023, Yan Zhao wrote:
> > > On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> > > > On Fri, Jan 06, 2023, Yan Zhao wrote:
> > > > > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > > > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > > > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > > > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > > > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > > > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > > > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > > > > 
> > > > > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > > > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > > > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > > > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > > > > unaware readers because the code is technically flawed.
> > > > > >
> > > > > Agreed. 
> > > > > Then after some further thought, I think maybe we can just remove
> > > > > intel_gvt_is_valid_gfn() in KVMGT, because
> > > > > 
> > > > > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > > > > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > > > > GFN.
> > > > > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > > > > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > > > > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> > > > 
> > > > IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> > > > gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> > > > means that bogus GFNs will trigger error messages, e.g.
> > > > 
> > > > 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> > > > 				     &cur_iova, ret);
> > > > 
> > > > and
> > > > 
> > > > 			gvt_vgpu_err("fail to populate guest ggtt entry\n");
> > > 
> > > Thanks for pointing it out.
> > > I checked this commit message and found below original intentions to introduce
> > > pre-validation:
> > 
> > ...
> > 
> > > (I actually found that the original code will print "invalid entry type"
> > > warning which indicates it's broken for a while due to lack of test in
> > > this invalid gfn path)
> > > 
> > > 
> > > > One thought would be to turn those printks into tracepoints to eliminate unwanted
> > > > noise, and to prevent the guest from spamming the host kernel log by programming
> > > > garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
> > > As those printks would not happen in normal conditions and printks may have
> > > some advantages to discover the attack or bug, could we just convert
> > > gvt_vgpu_err() to be ratelimited ?
> > 
> > That's ultimately a decision that needs to be made by the GVT maintainers, as the
> > answer depends on the use case.  E.g. if most users of KVMGT run a single VM and
> > the guest user is also the host admin, then pr_err_ratelimited() is likely an
> > acceptable/preferable choice as there's a decent chance a human will see the errors
> > in the host kernel logs and be able to take action.
> > 
> > But if there's unlikely to be a human monitoring the host logs, and/or the guest
> > user is unrelated to the host admin, then a ratelimited printk() is less useful.
> > E.g. if there's no one monitoring the logs, then losing messages due to
> > ratelimiting provides a worse debug experience overall than having to manually
> > enable tracepoints.   And if there may be many tens of VMs (seems unlikely?), then
> > ratelimited printk() is even less useful because errors for a specific VM may be
> > lost, i.e. the printk() can't be relied upon in any way to detect issues.
> > 
> > FWIW, in KVM proper, use of printk() to capture guest "errors" is strongly discourage
> > for exactly these reasons.
> 
> Current KVMGT usage is mostly in controlled mode, either user is own host admin,
> or host admin would pre-configure specific limited number of VMs for KVMGT use.
> I think printk on error should be fine, we don't need rate limit, and adding
> extra trace monitor for admin might not be necessary. So I'm towards to keep to
> use current error message.
> 

Thanks, Sean and Zhenyu.
So, could I just post the final fix as below?
And, Sean, would you like to include it in this series or should I send it out
first?

>From dcc931011da3712333f61684ebb20765dbf2fb46 Mon Sep 17 00:00:00 2001
From: Yan Zhao <yan.y.zhao@xxxxxxxxx>
Date: Thu, 19 Jan 2023 11:15:54 +0800
Subject: [PATCH] drm/i915/gvt: remove interface intel_gvt_is_valid_gfn

Currently intel_gvt_is_valid_gfn() is called in two places:
(1) shadowing guest GGTT entry
(2) shadowing guest PPGTT leaf entry,
which was introduced in commit cc753fbe1ac4
("drm/i915/gvt: validate gfn before set shadow page entry").

However, now it's not necessary to call this interface any more, because
a. GGTT partial write issue has been fixed by
   commit bc0686ff5fad
   ("drm/i915/gvt: support inconsecutive partial gtt entry write")
   commit 510fe10b6180
   ("drm/i915/gvt: fix a bug of partially write ggtt enties")
b. PPGTT resides in normal guest RAM and we only treat 8-byte writes
   as valid page table writes. Any invalid GPA found is regarded as
   an error, either due to guest misbehavior/attack or bug in host
   shadow code.
   So,rather than do GFN pre-checking and replace invalid GFNs with
   scratch GFN and continue silently, just remove the pre-checking and
   abort PPGTT shadowing on error detected.
c. GFN validity check is still performed in
   intel_gvt_dma_map_guest_page() --> gvt_pin_guest_page().
   It's more desirable to call VFIO interface to do both validity check
   and mapping.
   Calling intel_gvt_is_valid_gfn() to do GFN validity check from KVM side
   while later mapping the GFN through VFIO interface is unnecessarily
   fragile and confusing for unaware readers.

Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 31 -------------------------------
 1 file changed, 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 445afecbe7ae..9b6c2ca1ee16 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,22 +49,6 @@
 static bool enable_out_of_sync = false;
 static int preallocated_oos_pages = 8192;

-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
-       struct kvm *kvm = vgpu->vfio_device.kvm;
-       int idx;
-       bool ret;
-
-       if (!vgpu->attached)
-               return false;
-
-       idx = srcu_read_lock(&kvm->srcu);
-       ret = kvm_is_visible_gfn(kvm, gfn);
-       srcu_read_unlock(&kvm->srcu, idx);
-
-       return ret;
-}
-
 /*
  * validate a gm address and related range size,
  * translate it to host gm address

@@ -1345,13 +1329,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
                        ppgtt_generate_shadow_entry(&se, s, &ge);
                        ppgtt_set_shadow_entry(spt, &se, i);
                } else {
-                       gfn = ops->get_pfn(&ge);
-                       if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                               ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-                               ppgtt_set_shadow_entry(spt, &se, i);
-                               continue;
-                       }
-
                        ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
                        if (ret)
                                goto fail;
@@ -2326,14 +2303,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
                m.val64 = e.val64;
                m.type = e.type;

-               /* one PTE update may be issued in multiple writes and the
-                * first write may not construct a valid gfn
-                */
-               if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                       ops->set_pfn(&m, gvt->gtt.scratch_mfn);
-                       goto out;
-               }
-
                ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
                                                   &dma_addr);
                if (ret) {
--
2.17.1