On Wed, Aug 21, 2024, David Woodhouse wrote: > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > > By performing the invalidation from the 'end' hook, the process is a lot > cleaner and simpler because it is guaranteed that ->needs_invalidation > will be cleared after hva_to_pfn() has been called, so the only thing > that hva_to_pfn_retry() needs to do is *wait* for there to be no pending > invalidations. > > Another false positive on the range based checks is thus removed as well. ... > @@ -261,6 +239,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) > goto out_error; > } > > + /* > + * Avoid populating a GPC with a user address which is already > + * being invalidated, if the invalidate_range_start() notifier > + * has already been called. The actual invalidation happens in > + * the invalidate_range_end() callback, so wait until there are > + * no active invalidations (for the relevant HVA). > + */ > + gpc_wait_for_invalidations(gpc); I'm not convinced scheduling out the vCPU is actually a good idea. At first blush, it sounds good, e.g. why consume CPU cycles when forward progress is blocked? But scheduling out the vCPU will likely negatively impact latency, and KVM can't predict when the invalidation will complete. E.g. the refresh() could have come along at the start of the invalidation, but it also could have arrived at the tail end. And if the vCPU is the only runnable task on the CPU, and the system is under enough load to trigger an invalidation, then scheduling out won't provide any benefit because odds are good the processor won't be able to get into a deep enough sleep state to allow other logical CPUs to hit higher turbo bins. The wait+schedule() logic for the memslots update is purely about being deliberately _unfair_ to avoid a deadlock (the original idea was to simply use a r/w semapahore to block memslot updates, but lock fairness lead to a possible deadlock). If we want to not busy wait, then we should probably do something along the lines of halt-polling, e.g. busy wait for a bit and then schedule() out. But even that would require tuning, and I highly doubt that there will be enough colliding invalidations and retries to build sufficient state to allow KVM to make a "good" decision. If you (or anyone else) have numbers to show that blocking until the invalidation goes away provides meaningful value in most cases, then by all means. But without numbers, I don't think we should go this route as it's not a clear win. > write_lock_irq(&gpc->lock); > > @@ -269,6 +255,13 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) > * attempting to refresh. > */ > WARN_ON_ONCE(gpc->valid); > + > + /* > + * Since gfn_to_pfn_cache_invalidate() is called from the > + * kvm_mmu_notifier_invalidate_range_end() callback, it can > + * invalidate the GPC the moment after hva_to_pfn() returned > + * a valid PFN. If that happens, retry. > + */ > } while (!gpc->needs_invalidation); > > gpc->valid = true; > -- > 2.44.0 >