On Fri, Jun 11, 2021, Sean Christopherson wrote: > On Fri, Jun 11, 2021, Vitaly Kuznetsov wrote: > > What I don't quite like (besides the fact that this 'nested_mmu' exists > > but I don't see an elegant way to get rid of it) is the fact that we now > > have the same logic to compute 'level' both in > > kvm_calc_nested_mmu_role() and init_kvm_nested_mmu(). We could've > > avoided that by re-aranging code in init_kvm_nested_mmu() I > > guess. Something like (untested): > > Yep, cleaning all that up is on my todo list, but there are some hurdles to > clear first. > > My thought is to either (a) initialize the context from the role, or (b) drop the > duplicate context information altogether. For (a), the NX bit is calculated > incorrectly in the role stuff, e.g. if paging is disabled then NX is effectively 0, > and I need that fix for the vCPU RESET/INIT series. It's benign for the role, > but not for the context. And (b) will require auditing for all flavors of MMUs; > I wouldn't be the least bit surprised to discover there's a corner case (or just > a regular case) that I'm overlooking. Ugh, nested NPT is completely fubar. Except for the "core" mode, all of the role and context calculations are done using L2 state instead of L1 host state. The APM explicitly states that CR0.WP is ignored, and SMEP/SMAP are implicitly ignored by virtue of the NPT walks always being tagged "user", but KVM botches the NX behavior and would mess up LA57 if it were supported. I sort out the mess, though I'm not sure how it will interact with the reset series...