On Tue, Oct 24, 2017 at 5:39 PM, geoff--- via iommu <iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > On 2017-10-25 08:31, Alex Williamson wrote: >> >> On Wed, 25 Oct 2017 07:16:46 +1100 >> geoff--- via iommu <iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >> >>> I have isolated it to a single change, although I do not completely >>> understand what other implications it might have. >>> >>> By just changing the line in `init_vmcb` that reads: >>> >>> save->g_pat = svm->vcpu.arch.pat; >>> >>> To: >>> >>> save->g_pat = 0x0606060606060606; >>> >>> This enables write back and performance jumps through the roof. >>> >>> This needs someone with more experience to write a proper patch that >>> addresses this in a smarter way rather then just hard coding the value. >>> >>> This patch looks like an attempt to fix this issue but it yields no >>> detectable performance gains. >>> >>> https://patchwork.kernel.org/patch/6748441/ >>> >>> Any takers? >> >> >> IOMMU is not the right list for such a change. I'm dubious this is >> correct since you're basically going against the comment immediately >> previous in the code, but perhaps it's a hint in the right direction. >> Thanks, >> >> Alex > > > As am I, which is why it needs someone with more experience to figure out > why this has had such a huge impact. I have been testing everything since > I made that change and I am finding that everything I throw at it works > at near native performance. > > I will post my findings to the KVM mailing list as it is clearly a KVM > issue with SVM, perhaps someone there can write a patch to fix this, or > at the very least allow for a workaround/quirk module parameter. > > >> >>> On 2017-10-25 06:08, geoff@xxxxxxxxxxxxxxx wrote: >>> > I have identified the issue! With NPT enabled I am now getting near >>> > bare >>> > metal performance with PCI pass through. The issue was with some stubs >>> > that have not been properly implemented. I will clean my code up and >>> > submit a patch shortly. >>> > >>> > This is a 10 year old bug that has only become evident with the recent >>> > ability to perform PCI pass-through with dedicated graphics cards. I >>> > would expect this to improve performance across most workloads that use >>> > AMD NPT. >>> > >>> > Here are some benchmarks to show what I am getting in my dev >>> > environment: >>> > >>> > https://www.3dmark.com/3dm/22878932 >>> > https://www.3dmark.com/3dm/22879024 >>> > >>> > -Geoff >>> > >>> > >>> > On 2017-10-24 16:15, geoff@xxxxxxxxxxxxxxx wrote: >>> >> Further to this I have verified that IOMMU is working fine, traces and >>> >> additional printk's added to the kernel module were used to check. All >>> >> accesses are successful and hit the correct addresses. >>> >> >>> >> However profiling under Windows shows there might be an issue with >>> >> IRQs >>> >> not reaching the guest. When FluidMark is running at 5fps I still see >>> >> excellent system responsiveness with the CPU 90% idle and the GPU load >>> >> at 6%. >>> >> >>> >> When switching PhysX to CPU mode the GPU enters low power mode, >>> >> indicating that the card is no longer in use. This would seem to >>> >> confirm that the GPU is indeed in use by the PhysX API correctly. >>> >> >>> >> My assumption now is that the IRQs from the video card are getting >>> >> lost. >>> >> >>> >> I could be completely off base here but at this point it seems like >>> >> the >>> >> best way to proceed unless someone cares to comment. >>> >> >>> >> -Geoff >>> >> >>> >> >>> >> On 2017-10-24 10:49, geoff@xxxxxxxxxxxxxxx wrote: >>> >>> Hi, >>> >>> >>> >>> I realize this is an older thread but I have spent much of today >>> >>> trying to >>> >>> diagnose the problem. >>> >>> >>> >>> I have discovered how to reliably reproduce the problem with very >>> >>> little effort. >>> >>> It seems that reproducing the issue has been hit and miss for people >>> >>> as it seems >>> >>> to primarily affect games/programs that make use of nVidia PhysX. My >>> >>> understanding of npt's inner workings is quite primitive but I have >>> >>> still spent >>> >>> much of my time trying to diagnose the fault and identify the cause. >>> >>> >>> >>> Using the free program FluidMark[1] it is possible to reproduce the >>> >>> issue, where >>> >>> on a GTX 1080Ti the rendering rate drops to around 4 fps with npt >>> >>> turned on, but >>> >>> if turned off the render rate is in excess of 60fps. >>> >>> >>> >>> I have produced traces for with and without ntp enabled during these >>> >>> tests which >>> >>> I can provide if it will help. So far I have been digging through how >>> >>> npt works >>> >>> and trying to glean as much information as I can from the source and >>> >>> the AMD >>> >>> specifications but much of this and how mmu works is very new to me >>> >>> so progress >>> >>> is slow. >>> >>> >>> >>> If anyone else has looked into this and has more information to share >>> >>> I would be >>> >>> very interested. >>> >>> >>> >>> Kind Regards, >>> >>> Geoffrey McRae >>> >>> HostFission >>> >>> https://hostfission.com >>> >>> >>> >>> >>> >>> [1]: >>> >>> >>> >>> http://www.geeks3d.com/20130308/fluidmark-1-5-1-physx-benchmark-fluid-sph-simulation-opengl-download/ >>> >>> _______________________________________________ >>> iommu mailing list >>> iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx >>> https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > _______________________________________________ > iommu mailing list > iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linuxfoundation.org/mailman/listinfo/iommu Hi all, Yeah, I just tested it and I confirm this works around the GPU performance hit we've all been seeing. Amazing find, and I'll be happy to see the final solution be merged upstream one day. Thanks, Sarnex