On la, 2017-05-06 at 02:58 +0000, Zhang, Xiong Y wrote: > > > > On ke, 2017-05-03 at 09:22 +0000, Zhang, Xiong Y wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > + David and Jon > > > > > > > > > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > > > > > > > > > The blocking issue I see is that bisecting is still not pointing at > > > > > relevant commits. Both bisected commits from Bugzilla are not related > > > > > to changes in stolen memory usage behavior. I'd assume a successful > > > > > bisect to land at the patches where we start creating kernel internal > > > > > objects from stolen memory. Otherwise we could be ignoring a bug > > > > > elsewhere. If it consistently lands on those patches, then there might > > > > > be something wrong with them, in addition to stolen memory problems. > > > > [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla > > descripted, > > > > > > > > > > > guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu > > hang > > > > > > > > > > > in guest dmesg. From this point, we could do git bisect. > > > > But tons of IOMMU DMA R/W exception to stolen memory exist in host > > dmesg > > > > > > > > > > > when guest kernel is 4.8 and 4.9. This means guest domain iommu table > > > > doesn't > > > > have mapping for stolen memory and IGD fail in accessing stolen memory > > > > from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression > > and > > > > > > > > > > > shouldn't go git bisect. You could check this host error message from the > > > > bugzilla > > > > attachment. And this should be fixed first. > > > > Anyway, I will try my best to get the ideal commit through git bisect, but > > I'm > > > > > > > > > > > afraid > > > > the result is the same as past because we don't have a stable good point to > > > > start git > > > > bisect. > > > [Zhang, Xiong Y] hi, Joonas: > > > As you said, the gpu hang exist because i915 create ring buffer from stolen > > memory. > > > > > > I did git bisect again, and the following commit is the first bad commit: > > > commit c58b735fc762e891481e92af7124b85cb0a51fce > > > > > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Date: Thu Aug 18 17:16:57 2016 +0100 > > > > > > drm/i915: Allocate rings from stolen > > > > > > If we have stolen available, make use of it for ringbuffer allocation. > > > Previously this was restricted to !llc platforms, as writing to stolen > > > requires a GGTT mapping - but now that we have partial mappable > > support, > > > > > > the mappable aperture isn't quite so precious so we can use it more > > > freely and ringbuffers are a good user for the otherwise wasted stolen. > > > > > > After reverting this patch from drm-intel-nightly, I didn't see gpu hang during > > guest boot process. > > > > > > So what's our next step ? > > > > An appropriate next step would be to evaluate how much work it is to > > support the RMRR passthrough David mentioned about in his commit. > [Zhang, Xiong Y] As Kevin explained, KVM community found the disadvantage > Of RMRR and have decided to not support RMRR passthrough, so it is really hard > for us to push such solution and isn't related to the workload. > Except usb and graphic card, all other devices with RMRR couldn't passthrough > to guest. But the driver of usb and graphic card couldn't access RMRR in such > environment. > https://access.redhat.com/sites/default/files/attachments/rmrr-wp1.pdf Does this patch have the right Cc's from KVM team? I'd like to hear directly from them that even the usage of RMRRs that follow the intention of VT-d spec are not going to be supported. That document predates the patches to add the exclusion for graphics. > > I'd also go talk with the IGD team, why they refuse to load the driver > > when stolen memory is correctly reported as zero, and insist on being > > lied to. > [Zhang, Xiong Y] thanks a lot for doing so. I don't have the contacts, so I assume you to pursue that. Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation