On Tue, Aug 20, 2024, David Woodhouse wrote: > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > > The existing retry loop in hva_to_pfn_retry() is extremely pessimistic. > If there are any concurrent invalidations running, it's effectively just > a complex busy wait loop because its local mmu_notifier_retry_cache() > function will always return true. > > Since multiple invalidations can be running in parallel, this can result > in a situation where hva_to_pfn_retry() just backs off and keep retrying > for ever, not making any progress. > > Solve this by being a bit more selective about when to retry. > > Introduce a separate 'needs invalidation' flag to the GPC, which allows > it to be marked invalid even while hva_to_pfn_retry() has dropped the > lock to map the newly-found PFN. This allows the invalidation to moved > to the range_end hook, and the retry loop only occurs for a given GPC if > its particular uHVA is affected. > > However, the contract for invalidate_range_{start,end} is not like a > simple TLB; the pages may have been freed by the time the end hook is > called. A "device" may not create new mappings after the _start_ hook is > called. To meet this requirement, hva_to_pfn_retry() now waits until no > invalidations are currently running which may affect its uHVA, before > finally setting the ->valid flag and returning. Please split this into 3 patches: 1. Add range-based GPC retry 2. Add the wait mechanism. 3. Add the needs_invalidation logic. #1 and #2 make sense to me, but I'm struggling to understanding why #3 is needed. KVM absolutely must not touch the memory after .invalidate_range_start(), so I don't see what is gained by deferring invalidation to invalidate_range_end().