> + David and Jon > > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote: > > The blocking issue I see is that bisecting is still not pointing at > relevant commits. Both bisected commits from Bugzilla are not related > to changes in stolen memory usage behavior. I'd assume a successful > bisect to land at the patches where we start creating kernel internal > objects from stolen memory. Otherwise we could be ignoring a bug > elsewhere. If it consistently lands on those patches, then there might > be something wrong with them, in addition to stolen memory problems. [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted, guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang in guest dmesg. From this point, we could do git bisect. But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg when guest kernel is 4.8 and 4.9. This means guest domain iommu table doesn't have mapping for stolen memory and IGD fail in accessing stolen memory from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and shouldn't go git bisect. You could check this host error message from the bugzilla attachment. And this should be fixed first. Anyway, I will try my best to get the ideal commit through git bisect, but I'm afraid the result is the same as past because we don't have a stable good point to start git bisect. > Disabling power saving makes many bugs go away, but we still don't > disable power saving as a resolution to such bugs, but instead root > cause and fix the individual bugs. [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0, i915.enable_fbc=0, I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub. But gpu hang exist in guest and DMA R/W error exist in host. > > > Stolen memory isn't a standard pci resource and exists in RMRR which has > > identity mapping in iommu table when host boot up, so IGD could access > > stolen memory in host OS. While according to 'commit c875d2c1b808 > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API > domains")',RMRR > > isn't supported by kvm, then both EPT and guest iommu domain table lack > > of maaping for stolen memory in kvm IGD passthrough environment. > > Commit message text still fails to address that an exclusion was added > by commit: > > commit 18436afdc11a00ac881990b454cfb2eae81d6003 > Author: David Woodhouse <David.Woodhouse@xxxxxxxxx> > Date: Wed Mar 25 15:05:47 2015 +0000 > > iommu/vt-d: Allow RMRR on graphics devices too > > Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from > IOMMU API > domains") prevents certain options for devices with RMRRs. This even > prevents those devices from getting a 1:1 mapping with 'iommu=pt', > because we don't have the code to handle *preserving* the RMRR > regions > when moving the device between domains. > > <SNIP> > > The quoted part of David's commit message leads me to believe it's > simply lack of some code in kernel for juggling the RMRRs when moving a > device between domains that is missing. Why is not that considered > instead? With that implemented, we would have more transparent pass- > through, which should be good. [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains). This patch prevent devices associated with RMRRs from assigning to a guest, the one of reason is it knows RMRR isn't supported in guest domain IOMMU table, If these device's driver still access RMRR from guest, serious error will happen. 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an exception to above commit. So IGD could be assigned to a guest. But this doesn't mean IGD 1:1 mapping for RMRR will be support in guest domain iommu table 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu table. When one device is assigned to a guest and this guest boot up, this guest domain Iommu table will take place of host domain iommu table on hardware. Our issue is guest domain iommu table doesn't have 1:1 mapping for RMRR. In order to set up 1:1 mapping for RMRR in guest domain iommu table, we have to modify kvm and qemu and kvm community have declined this. > > Also, was fixing the IGD driver loading with zero stolen memory > considered instead? All this information should exist in the commit > message. [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get the size of stolen memory. When guest read this register, qemu could trap it and return one value to guest. So in order to " fixing the IGD driver loading with zero stolen memory ", We have to modify both Qemu and IGD driver: 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit loading and continue. This doesn't give any benefit to i915, i915 will still disable stolen memory as i915 see zero size stolen memory . So I prefer to disable stolen memory in i915 directly and keep Qemu and IGD driver unchanged. > > After the bisecting is properly done, there is an agreement that > suggested RMRR preservation is absolutely a no-go, other options are > not viable, the commit message should be updated to reflect all that. > Then we should look in more detail on how to detect the scenarios when > we're running in a virtual machine that doesn't set up the 1:1 mapping > for RMRRs. [Zhang, Xiong Y] Sure, I will do this once we have an agreement. I really need the help from others who could correct me if I am wrong. > > Regards, Joonas > -- > Joonas Lahtinen > Open Source Technology Center > Intel Corporation