On 16/02/21 19:26, Joerg Roedel wrote:
On Tue, Feb 16, 2021 at 05:48:29PM +0100, Paolo Bonzini wrote:
We should minimize the number of #VEs that we get, as they are very slow.
Could almost everything that can invoke a #VE go through pvops and be turned
into a TDCALL? And if so the same should be true for SEV-ES #VC as well.
The problem with that is that it requires the guest to know what the
hypervisor will intercept or what instruction will cause a #VE. I
considered this paravirtualization for #VC, but stayed away from it for
that exact reason. You can't easily know which MMIO-access will cause a
#VE/#VC exception. Probing also doesn't work, as the Hypervisor can
change that at runtime. There is just no decent way to handle that
without taking the #VE/#VC. Or take 'hlt' for example, there are
hypervisor configurations which don't intercept it. How do you know that
from within the guest?
I'm thinking that the SEV-ES/TDX specs and the hypervisor's PV interface
(CPUID/MSR) should tell the guest what to invoke directly, not the other
way round. TDCALL-ing out should always be possible.
Not saying this is the case right now, but I think the SEV-ES and TDX
specs should evolve in that direction.
Paolo
I guess those could all be replaced direct TDCALLs,
but the question remains whether this is possible with MSR accesses, means
that the list of MSRs which will cause #VEs is statically defined and
doesn't change between hypervisors. All in all this sounds hard to
maintain and easy to break by unrelated changes.
I would expect that all MSRs except for a handful (SPEC_CTRL/PRED_CMD, the
FS/GS/kernelGS bases, anything else?) would be redirect to TDCALL.
You never know which HV your guest runs under and what it intercepts. It
can certainly be made part of the Spec to only allow direct access to a
given set of MSRs in a TDX guest and require to intercept everything
else. But that Spec probably requires constant updating and will
certainly cause compatibility headaches in the future.
Regards,
Joerg