On 16/01/2024 18:20, Sean Christopherson wrote: >> Does this make sense to you? Happy to double-check or run more tests if >> anything seems off. > > Ha! It too me a few minutes to realize what went sideways with v2. KVM has an > in-flight change that switches from host virtual addresses (HVA) to guest physical > frame numbers (GFN) for the retry check, commit 8569992d64b8 ("KVM: Use gfn instead > of hva for mmu_notifier_retry"). > > That commit is in the KVM pull request for 6.8, and so v2 is based on top of a > branch that contains said commit. But for better or worse (probably worse), the > switch from HVA=GFN didn't change the _names_ of mmu_invalidate_range_{start,end}, > only the type. So v2 applies and compiles cleanly on 6.7, but it's subtly broken > because checking for a GFN match against an HVA range is all but guaranteed to get > false negatives. Oof, that's nifty, good catch! I'll pay more attention to the base-commit when testing next time. :) > If you can try v2 on top of `git://git.kernel.org/pub/scm/virt/kvm/kvm.git next`, > that would be helpful to confirm that I didn't screw up something else. Pulled that repository and can confirm: * 1c6d984f ("x86/kvm: Do not try to disable kvmclock if it was not enabled", current `next`): reproducer hangs * v2 [1] ("KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing") applied on top of 1c6d984f: no hangs anymore If I understand the discussion on [1] correctly, there might be a v3 -- if so, I'll happily test that too. > Thanks very much for reporting back! I'm pretty sure we would have missed the > semantic conflict when backporting the fix to 6.7 and earlier, i.e. you likely > saved us from another round of bug reports for various stable trees. Sure! Thanks a lot for taking a look at this! Best wishes, Friedrich [1] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@xxxxxxxxxx/