On Thu, Jul 23, 2020 at 12:35:44PM -0600, Alex Williamson wrote: > On Thu, 23 Jul 2020 08:57:11 -0700 > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > On Tue, Jul 21, 2020 at 10:00:36AM -0600, Alex Williamson wrote: > > > On Mon, 20 Jul 2020 20:03:19 -0700 > > > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > > +Weijiang > > > > > > > > On Mon, Jul 13, 2020 at 12:06:50PM -0700, Sean Christopherson wrote: > > > > > The only ideas I have going forward are to: > > > > > > > > > > a) Reproduce the bug outside of your environment and find a resource that > > > > > can go through the painful bisection. > > > > > > > > We're trying to reproduce the original issue in the hopes of biesecting, but > > > > have not yet discovered the secret sauce. A few questions: > > > > > > > > - Are there any known hardware requirements, e.g. specific flavor of GPU? > > > > > > I'm using an old GeForce GT635, I don't think there's anything special > > > about this card. > > > > Would you be able to provide your QEMU command line? Or at least any > > potentially relevant bits? Still no luck reproducing this on our end. *sigh* The "good" news is that we were able to reproduce and bisect the "fix". That bad news is that the "fix" is the fracturing of large pages for the iTLB multi-hit bug, added by commit b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation"). The GPU pass-through failures can be reproduced by loading KVM with kvm.nx_huge_pages=0. So, we have another data point, but still no clear explanation of exactly what is broken.