On 2/9/22 18:07, Sean Christopherson wrote:
On Wed, Feb 09, 2022, Paolo Bonzini wrote:
The TDP MMU has a performance regression compared to the legacy MMU
when CR0 changes often. This was reported for the grsecurity kernel,
which uses CR0.WP to implement kernel W^X. In that case, each change to
CR0.WP unloads the MMU and causes a lot of unnecessary work. When running
nested, this can even cause the L1 to hardly make progress, as the L0
hypervisor it is overwhelmed by the amount of MMU work that is needed.
FWIW, my flushing/zapping series fixes this by doing the teardown in an async
worker. There's even a selftest for this exact case :-)
https://lore.kernel.org/all/20211223222318.1039223-1-seanjc@xxxxxxxxxx
I'll check it out (it's next on my list as soon as I finally push
kvm/{master,next}, which in turn was blocked by this work).
But not zapping the roots is even better---especially when KVM is nested
and the TDP MMU's page table rebuild is very heavy on L0. I'm not sure
if there are any (cumulative) stats that capture the optimization, but
if not I'll add them.
Paolo