Re: [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm

Alex Williamson <alex.williamson@xxxxxxxxxx> · Mon, 8 May 2017 09:01:08 -0600

On Mon, 08 May 2017 13:07:10 +0300
Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> wrote:

> On la, 2017-05-06 at 02:58 +0000, Zhang, Xiong Y wrote:
> > > 
> > > On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote:  
> > > >   
> > > > > 
> > > > >   
> > > > > > 
> > > > > > 
> > > > > > + David and Jon
> > > > > > 
> > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> > > > > > 
> > > > > > The blocking issue I see is that bisecting is still not pointing at
> > > > > > relevant commits. Both bisected commits from Bugzilla are not related
> > > > > > to changes in stolen memory usage behavior. I'd assume a successful
> > > > > > bisect to land at the patches where we start creating kernel internal
> > > > > > objects from stolen memory. Otherwise we could be ignoring a bug
> > > > > > elsewhere. If it consistently lands on those patches, then there might
> > > > > > be something wrong with them, in addition to stolen memory problems.  
> > > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla  
> > > descripted,  
> > > >   
> > > > > 
> > > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu  
> > > hang  
> > > >   
> > > > > 
> > > > > in guest dmesg. From this point, we could do git bisect.
> > > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host  
> > > dmesg  
> > > >   
> > > > > 
> > > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> > > > > doesn't
> > > > > have mapping for stolen memory and IGD fail in accessing stolen memory
> > > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression  
> > > and  
> > > >   
> > > > > 
> > > > > shouldn't go git bisect. You could check this host error message from the
> > > > > bugzilla
> > > > > attachment. And this should be fixed first.
> > > > > Anyway, I will try my best to get the ideal commit through git bisect, but  
> > > I'm  
> > > >   
> > > > > 
> > > > > afraid
> > > > > the result is the same as past because we don't have a stable good point to
> > > > > start git
> > > > > bisect.  
> > > > [Zhang, Xiong Y] hi, Joonas:
> > > > As you said, the gpu hang exist because i915 create ring buffer from stolen  
> > > memory.  
> > > > 
> > > > I did git bisect again, and the following commit is the first bad commit:
> > > > commit c58b735fc762e891481e92af7124b85cb0a51fce  
> > > > > > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>  
> > > > Date:   Thu Aug 18 17:16:57 2016 +0100
> > > > 
> > > >     drm/i915: Allocate rings from stolen
> > > > 
> > > >     If we have stolen available, make use of it for ringbuffer allocation.
> > > >     Previously this was restricted to !llc platforms, as writing to stolen
> > > >     requires a GGTT mapping - but now that we have partial mappable  
> > > support,  
> > > > 
> > > >     the mappable aperture isn't quite so precious so we can use it more
> > > >     freely and ringbuffers are a good user for the otherwise wasted stolen.
> > > > 
> > > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang during  
> > > guest boot process.  
> > > > 
> > > > So what's our next step ?  
> > > 
> > > An appropriate next step would be to evaluate how much work it is to
> > > support the RMRR passthrough David mentioned about in his commit.  
> > [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage
> > Of RMRR and have decided to not support RMRR passthrough, so it is really hard
> > for us to push such solution and isn't related to the workload.
> > Except usb and graphic card, all other devices with RMRR couldn't passthrough
> > to guest. But the driver of usb and graphic card couldn't access RMRR in such
> > environment.
> > https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf  
> 
> Does this patch have the right Cc's from KVM team? I'd like to hear
> directly from them that even the usage of RMRRs that follow the
> intention of VT-d spec are not going to be supported. That document
> predates the patches to add the exclusion for graphics.

I'm the QEMU and kernel vfio maintainer and co-author of the above
whitepaper.  Even the VT-d spec suggests that usage of RMRRs should be
limited (rev 2.3, 8.4):

  Platform designers should avoid or limit use of reserved memory
  regions since these require system software to create holes in the
  DMA virtual address range available to system software and its
  drivers.

Also, if you read the entire thread which added the graphics exception
for RMRR, you'll see that it went in under some degree of protest and
ultimately under the conclusion that we should just ignore the RMRR
anyway:

https://lists.linuxfoundation.org/pipermail/iommu/2015-April/012790.html

At least for the case of IGD RMRRs, we don't expect that they're used
for health monitoring of the devices via back channels as the interface
has been abused to do elsewhere, so ignoring the RMRR and not
supporting it in the VM means that the device is only impairing itself,
which is fine.  I would balk at trying to add RMRR support into vfio
and the virt stack for the reasons outlined in the above whitepaper.
Thanks,

Alex
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx