On 12/17/2024 9:53 AM, Sean Christopherson wrote:
On Tue, Dec 10, 2024, Rick P Edgecombe wrote:
On Tue, 2024-12-10 at 11:22 +0800, Xiaoyao Li wrote:
The solution in this proposal decreases the work the VMM has to do, but
in the long term won't remove hand coding completely. As long as we are
designing something, what kind of bar should we target?
For this specific #VE reduction case, I think userspace doesn't need to
do any hand coding. Userspace just treats the bits related to #VE
reduction as configurable as reported by TDX module/KVM. And userspace
doesn't care if the value seen by TD guest is matched with what gets
configured by it because they are out of control of userspace.
Besides a specific problem, here reduced #VE is also an example of increasing
complexity for TD CPUID. If we have more things like it, it could make this
interface too rigid.
I agree with Rick in that having QEMU treat them as configurable is going to be
a disaster. But I don't think it's actually problematic in practice.
Correct the proposal. It should be QEMU treats them as what KVM reports.
TDX module reports these #VE reduction related CPUIDs as configurable
because it allows VMM to paravirt them. If KVM doesn't support the
paravirt of them, KVM can clear them from configurable bits and add them
to fixed0 bits when KVM reports to userspace.
If QEMU (or KVM) has no visibility into the state of the guest's view of the
affected features, then it doesn't matter whether they are fixed or configurable.
They're effectively Schrödinger's bits: until QEMU/KVM actually looks at them,
they're neither dead nor alive, and since QEMU/KVM *can't* look at them, who cares?
To some degree, I think it matters. As I explained above, if KVM reports
it as configurable to userspace, it mean TDX module allows it to be
configured and KVM allows it to be paravirtualized as well. So userspace
can configure it as 1 when users wants it. This is how VMM is going to
present the feature to TD guest.
However, how TD guest is going to use it depends on itself.
1) when TD guest doesn't enable #VE reduction: the configuration from
VMM doesn't matter. The CPUIDs are fixed1 and related operation leads to
#VE.
2) When TD guest enables #VE reduction and doesn't enable
TDCS.FEATURE_PARAVIRT_CTRL of the related bit: the configuration from
VMM doesn't matter. The CPUIDs are fixed0 and related operation leads to
#GP.
3) When TD guest enables #VE reduction and enable
TDCS.FEATURE_PARAVIRT_CTRL of the related bit: the configuration from
VMM matters.
- When VMM configures the bits to 1, the related operation leads to
#VE (for paravirtualization).
- When VMM configures the bits to 0, the related operation leads to #GP.
So for case 3), it does matters.
So, if the TDX Module *requires* them to be set/cleared when the TD is created,
then they should be reported as fixed. If the TDX module doesn't care, then they
should be reported as configurable. The fact that the guest can muck with things
under the hood doesn't factor into that logic.
yes, I agree on it.
If TDX pulls something like this for features that KVM cares about, then we have
problems, but that's already true today. If a feature requires KVM support, it
doesn't really matter if the feature is fixed or configurable. What matters is
that KVM has a chance to enforce that the feature can be used by the guest if
and only if KVM has the proper support in place. Because if KVM is completely
unaware of a feature, it's impossible for KVM to know that the feature needs to
be rejected.
I agree.
With the proposed fixed/fixed1 information, and in addition to the
configurable bits, KVM can fully validate the TDX module against its
capabilities. When violation occurs (e.g., some KVM unsupported bit
being reported as fixed1 by TDX module), KVM can just refuse to enable TDX.
This isn't unique to TDX, CoCo, or firmware. Every new feature that lands in
hardware needs to either be "benign" or have the appropriate virtualization
controls. KVM already has to deal with cases where features can effectively be
used without KVM's knowledge. E.g. there are plenty of instruction-level
virtualization holes, and SEV-ES doubled down by essentially forcing KVM to let
the guest write XCR0 and XSS directly.
It all works, so long as the hardware vendor doesn't screw up and let the guest
use a feature that impacts host safety and/or functionality, without the hypervisor's
knowledge.
So, just don't screw up :-)