On 2/4/22 20:18, David Matlack wrote:
On Fri, Feb 04, 2022 at 06:57:01AM -0500, Paolo Bonzini wrote:
__kvm_mmu_new_pgd looks at the MMU's root_level and shadow_root_level
via fast_pgd_switch.
Those checks are just for performance correct (to skip iterating through
the list of roots)?
Either way, it's probably worth including a Fixes tag below.
It's actually a much bigger mess---though it's working as intended,
except that in some cases the caching is suboptimal.
Basically, you have to call __kvm_mmu_new_pgd *before* kvm_init_mmu
because of how fast_pgd_switch takes care of stashing the current root
in the cache. A PAE root is never placed in the cache exactly because
mmu->root_level and mmu->shadow_root_level hold the value for the
_current_ root. In that case, fast_pgd_switch does not look for a
cached PGD (even if according to the role it could be there).
__kvm_mmu_new_pgd then frees the current root using kvm_mmu_free_roots.
kvm_mmu_free_roots *also* uses mmu->root_level and
mmu->shadow_root_level to distinguish whether the page table uses a
single root or 4 PAE roots. Because kvm_init_mmu can overwrite
muu->root_level, kvm_mmu_free_roots must also be called before kvm_init_mmu.
I do wonder if the use of mmu->shadow_root_level was intentional (it
certainly escaped me when first reviewing fast PGD switching), but
fortunately none of this is necessary. PAE roots can be identified from
!to_shadow_page(root_hpa), so the better fix is to do that:
- in fast_pgd_switch/cached_root_available, you need to account for two
possible transformations of the cache: either the old entry becomes the
MRU of the prev_roots as in the current code, or the old entry cannot be
cached. This is 20-30 more lines of code.
- in kvm_mmu_free_roots, just use to_shadow_page instead of
mmu->root_level and mmu->shadow_root_level.
Once this is in place, the original bug is fixed simply by calling
kvm_mmu_new_pgd after kvm_init_mmu. kvm_mmu_reset_context is not doing
now to avoid having to figure out the new role, but it can do that
easily after the above change.
I'll keep this cpu_role refactoring around, but strictly speaking it
becomes a separate change than the optimization.
Paolo