Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Mon, 9 Jan 2023 17:58:16 +0800




On Fri, Jan 06, 2023 at 11:01:53PM +0000, Sean Christopherson wrote:
> On Fri, Jan 06, 2023, Yan Zhao wrote:
> > On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> > > On Thu, Jan 05, 2023, Yan Zhao wrote:
> > > I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> > > and permissions, and that the only requirement for KVM memslots is that GTT page
> > > tables need to be visible in KVM's memslots.  But if that's the ABI, then
> > > intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> > > ("drm/i915/gvt: validate gfn before set shadow page entry").
> > > 
> > > In other words, pick either VFIO or KVM.  Checking that X is valid according to
> > > KVM and then mapping X through VFIO is confusing and makes assumptions about how
> > > userspace configures KVM and VFIO.  It works because QEMU always configures KVM
> > > and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> > > unaware readers because the code is technically flawed.
> > >
> > Agreed. 
> > Then after some further thought, I think maybe we can just remove
> > intel_gvt_is_valid_gfn() in KVMGT, because
> > 
> > (1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
> > ppgtt_populate_spt() are not for page track purpose, but to validate bogus
> > GFN.
> > (2) gvt_pin_guest_page() with gfn and size can do the validity checking,
> > which is called in intel_gvt_dma_map_guest_page(). So, we can move the
> > mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
> 
> IIUC, that will re-introduce the problem commit cc753fbe1ac4 ("drm/i915/gvt: validate
> gfn before set shadow page entry") solved by poking into KVM.  Lack of pre-validation
> means that bogus GFNs will trigger error messages, e.g.
> 
> 			gvt_vgpu_err("vfio_pin_pages failed for iova %pad, ret %d\n",
> 				     &cur_iova, ret);
> 
> and
> 
> 			gvt_vgpu_err("fail to populate guest ggtt entry\n");

Thanks for pointing it out.
I checked this commit message and found below original intentions to introduce
pre-validation:
   "GVT may receive partial write on one guest PTE update. Validate gfn
    not to translate incomplete gfn. This avoids some unnecessary error
    messages incurred by the incomplete gfn translating. Also fix the
    bug that the whole PPGTT shadow page update is aborted on any invalid
    gfn entry"

(1) first intention -- unnecessary error message came from GGTT partial write.
    For guest GGTT writes, the guest calls writeq to an MMIO GPA, which is
    8 bytes in length, while QEMU splits the MMIO write into 2 4-byte writes.
    The splitted 2 writes can cause invalid GFN to be found.

    But this partial write issue has been fixed by the two follow-up commits:
        bc0686ff5fad drm/i915/gvt: support inconsecutive partial gtt entry write
        510fe10b6180 drm/i915/gvt: fix a bug of partially write ggtt enties

    so pre-validation to reduce noise is not necessary any more here.

(2) the second intention -- "the whole PPGTT shadow page update is aborted on any
    invalid gfn entry"
    As PPGTT resides in normal guest RAM and we only treat 8-byte writes
    as valid page table writes, any invalid GPA found is regarded as
    an error, either due to guest misbehavior/attack or bug in host
    shadow code. 
    So, direct abort looks good too. Like below:

@@ -1340,13 +1338,6 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
                        ppgtt_generate_shadow_entry(&se, s, &ge);
                        ppgtt_set_shadow_entry(spt, &se, i);
                } else {
-                       gfn = ops->get_pfn(&ge);
-                       if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
-                               ops->set_pfn(&se, gvt->gtt.scratch_mfn);
-                               ppgtt_set_shadow_entry(spt, &se, i);
-                               continue;
-                       }
-
                        ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
                        if (ret)
                                goto fail;

(I actually found that the original code will print "invalid entry type"
warning which indicates it's broken for a while due to lack of test in
this invalid gfn path)


> One thought would be to turn those printks into tracepoints to eliminate unwanted
> noise, and to prevent the guest from spamming the host kernel log by programming
> garbage into the GTT (gvt_vgpu_err() isn't ratelimited).
As those printks would not happen in normal conditions and printks may have
some advantages to discover the attack or bug, could we just convert
gvt_vgpu_err() to be ratelimited ?

Thanks
Yan

> 
> > > On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
> > > gfn.  If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
> > > to handle the case where the guest creates a bogus mapping when writing an existing
> > > GTT PT.
> > Don't get it here. Could you elaborate more?
> 
> AFAICT, KVMGT only pre-validates the GFN on the initial setup, not when the guest
> modifies a write-tracked entry.  I believe this is a moot point if the pre-validation
> is removed entirely.
> 
> > > 	gfn = pte_ops->get_pfn(ge);
> > > 	if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
> > > 		goto set_shadow_entry;
> > As KVMGT only tracks PPGTT page table pages, this check here is not for page
> > track purpose, but to check bogus GFN.
> > So, Just leave the bogus GFN check to intel_gvt_dma_map_guest_page() through
> > VFIO is all right.
> > 
> > On the other hand, for the GFN validity for page track purpose, we can
> > leave it to kvm_write_track_add_gfn().
> > 
> > Do you think it's ok?
> 
> Yep, the only hiccup is the gvt_vgpu_err() calls that are guest-triggerable, and
> converting those to a tracepoint seems like the right answer.