On 2021/9/4 00:40, Sean Christopherson wrote:
On Sat, Sep 04, 2021, Lai Jiangshan wrote:
On 2021/9/4 00:06, Sean Christopherson wrote:
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 50ade6450ace..2ff123ec0d64 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -704,6 +704,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
access = gw->pt_access[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr,
it.level-1, false, access);
+ if (sp->unsync_children &&
+ mmu_sync_children(vcpu, sp, false))
+ return RET_PF_RETRY;
It was like my first (unsent) fix. Just return RET_PF_RETRY when break.
And then I thought that it'd be better to retry fetching directly rather than
retry guest when the conditions are still valid/unchanged to avoid all the
next guest page walking and GUP(). Although the code does not check all
conditions such as interrupt event pending. (we can add that too)
But not in a bug fix that needs to go to stable branches.
Good point, it is too complicated for a fix, I accept just "return RET_PF_RETRY".
(and don't need "SOME_ARBITRARY_THRESHOLD").
Is it Ok? I will update the patch as it.
I think it is a good design to allow break mmu_lock when mmu is handling
heavy work.
I don't disagree in principle, but I question the relevance/need. I doubt this
code is relevant to nested TDP performance as hypervisors generally don't do the
type of PTE manipulations that would lead to linking an existing unsync sp. And
for legacy shadow paging, my preference would be to put it into maintenance-only
mode as much as possible. I'm not dead set against new features/functionality
for shadow paging, but for something like dropping mmu_lock in the page fault path,
IMO there needs to be performance numbers to justify such a change.
I understood the concern and the relevance/need.