On Mon, Jul 13, 2020 at 12:22:26PM -0600, Alex Williamson wrote: > On Thu, 9 Jul 2020 21:29:22 -0700 > Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > > +Alex, whom I completely spaced on Cc'ing. > > > > Alex, this is related to the dreaded VFIO memslot zapping issue from last > > year. Start of thread: https://patchwork.kernel.org/patch/11640719/. > > > > The TL;DR of below: can you try the attached patch with your reproducer > > from the original bug[*]? I honestly don't know whether it has a legitimate > > chance of working, but it's the one thing in all of this that I know was > > definitely a bug. I'd like to test it out if only to sate my curiosity. > > Absolutely no rush. > > Mixed results, maybe you can provide some guidance. Running this > against v5.8-rc4, I haven't reproduced the glitch. But it's been a > long time since I tested this previously, so I went back to v5.3-rc5 to > make sure I still have a recipe to trigger it. I can still get the > failure there as the selective flush commit was reverted in rc6. Then > I wondered, can I take broken v5.3-rc5 and apply this fix to prove that > it works? No, v5.3-rc5 + this patch still glitches. So I thought > maybe I could make v5.8-rc4 break by s/true/false/ in this patch. > Nope. Then I applied the original patch from[1] to try to break it. > Nope. So if anything, I think the evidence suggests this was broken > elsewhere and is now fixed, or maybe it is a timing issue that I can't > trigger on newer kernels. If the reproducer wasn't so touchy and time > consuming, I'd try to bisect, but I don't have that sort of bandwidth. Ow. That manages to be both a best case and worst case scenario. I can't think of any clever way to avoid bisecting. There have been a number of fixes in tangentially related code since 5.3, e.g. memslots, MMU, TLB, etc..., but trying to isolate which one, if any of them, fixed the bug has a high probability of being a wild goose chase. The only ideas I have going forward are to: a) Reproduce the bug outside of your environment and find a resource that can go through the painful bisection. b) Add a module param to toggle the new behavior and see if anything breaks. I can ask internally if it's possible to get a resource on my end to go after (a). (b) is a question for Paolo. Thanks much for testing! > Thanks, > > Alex > > [1] https://patchwork.kernel.org/patch/10798453/ >