On 21/01/21 22:32, Sean Christopherson wrote:
Coming back to this series, I wonder if the RCU approach is truly necessary to
get the desired scalability. If both zap_collapsible_sptes() and NX huge page
recovery zap_only_ leaf SPTEs, then the only path that can actually unlink a
shadow page while holding the lock for read is the page fault path that installs
a huge page over an existing shadow page.
Assuming the above analysis is correct, I think it's worth exploring alternatives
to using RCU to defer freeing the SP memory, e.g. promoting to a write lock in
the specific case of overwriting a SP (though that may not exist for rwlocks),
or maybe something entirely different?
You can do the deferred freeing with a short write-side critical section
to ensure all readers have terminated.
If the bool argument to handle_disconnected_tdp_mmu_page is true(*), the
pages would be added to an llist, instead of being freed immediately.
At the end of a shared critical section you would do
if (!llist_empty(&kvm->arch.tdp_mmu_disconnected_pages)) {
struct llist_node *first;
kvm_mmu_lock(kvm);
first = __list_del_all(&kvm->arch.tdp_mmu_disconnected_pages);
kvm_mmu_unlock(kvm);
/*
* All vCPUs have already stopped using the pages when
* their TLBs were flushed. The exclusive critical
* section above means that there can be no readers
* either.
*/
tdp_mmu_free_disconnected_pages(first);
}
So this is still deferred reclamation, but it's done by one of the vCPUs
rather than a worker RCU thread. This would replace patches 11/12/13
and probably would be implemented after patch 18.
Paolo
(*) this idea is what prompted the comment about s/atomic/shared/