On Wed, Jan 05, 2022, David Matlack wrote: > On Thu, Dec 23, 2021 at 10:23:13PM +0000, Sean Christopherson wrote: > > Zapping at 1gb in the first pass is not arbitrary. First and foremost, > > KVM relies on being able to zap 1gb shadow pages in a single shot when > > when repacing a shadow page with a hugepage. > > When dirty logging is disabled, zap_collapsible_spte_range() does the > bulk of the work zapping leaf SPTEs and allows yielding. I guess that > could race with a vCPU faulting in the huge page though and the vCPU > could do the bulk of the work. > > Are there any other scenarios where KVM relies on zapping 1GB worth of > 4KB SPTEs without yielding? Yes. Zapping executable shadow pages that were forced to be small because of the iTLB multihit mitigation. If the VM is using nested EPT and a shadow page is unaccounted, in which case decrementing disallow_lpage could allow a hugepage and a fault in the 1gb region that installs a 1gb hugepage would then zap the shadow page. There are other scenarios, though they are much more contrived, e.g. if the guest changes its MTRRs such that a previously disallowed hugepage is now allowed. > In any case, 100ms is a long time to hog the CPU. Why not just do the > safe thing and zap each level? 4K, then 2M, then 1GB, ..., then root > level. The only argument against it I can think of is performance (lots > of redundant walks through the page table). But I don't think root > zapping is especially latency critical. I'm not opposed to that approach, assuming putting a root is done asynchronously so that the high latency isn't problematic for vCPUs. Though from a test coverage perspective, I do like zapping at the worst case level (for the above flows). And regarding the latency, if it's problematic we could track the number of present SPTEs and skip the walk if there are none. The downside is that doing so would require an atomic operation when installing SPTEs to play nice with parallel page faults. > > @@ -846,6 +858,11 @@ static void tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root, > > } > > } > > > > + if (zap_level < root->role.level) { > > + zap_level = root->role.level; > > + goto start; > > + } > > This is probably just person opinion but I find the 2 iteration goto > loop harder to understand than just open-coding the 2 passes. Yeah, but it's clever! I'll add another helper unless it turns out gross for some reason. :-)