On Mon, Dec 13, 2021, Sean Christopherson wrote: > On Mon, Dec 13, 2021, Sean Christopherson wrote: > > On Mon, Dec 13, 2021, Ignat Korchagin wrote: > > > Unfortunately, this patchset does not fix the original issue reported in [1]. > > > > Can you provide your kernel config? And any other version/config info that might > > be relevant, e.g. anything in gvisor or runsc? > > Scratch that, I've reproduced this, with luck I'll have a root cause by end of day. Ok, the root cause is comically simple compared to all the theories we came up with. If tdp_mmu_iter_cond_resched() drops mmu_lock and restarts the iterator, the "continue" in the caller triggers tdp_iter_next(). tdp_iter_next() does what it's told and advances the iterator. Because all users call tdp_mmu_iter_cond_resched() at the very beginning of the loop, this has the effect of skipping the current SPTE. E.g. in the "zap all" case, where iter->level == iter->min_level == iter->root_level, we effectively end up with code like this, which is obviously wrong once the complexity of traversing a tree is simplified down to walking an array of SPTEs. gfn_t end = tdp_mmu_max_gfn_host(); gfn_t start = 0; gfn_t last; for (i = last = start; i < end; i += 8, last = i) { if (cond_resched()) { i = last; continue; } sp = &root->spt[i]; zap(sp); } Patch incoming...