On 9/9/24 23:11, Sean Christopherson wrote:
In general, I am_very_ opposed to blindly retrying an SEPT SEAMCALL, ever. For
its operations, I'm pretty sure the only sane approach is for KVM to ensure there
will be no contention. And if the TDX module's single-step protection spuriously
kicks in, KVM exits to userspace. If the TDX module can't/doesn't/won't communicate
that it's mitigating single-step, e.g. so that KVM can forward the information
to userspace, then that's a TDX module problem to solve.
In principle I agree but we also need to be pragmatic. Exiting to
userspace may not be practical in all flows, for example.
First of all, we can add a spinlock around affected seamcalls. This way
we know that "busy" errors must come from the guest and have set
HOST_PRIORITY. It is still kinda bad that guests can force the VMM to
loop, but the VMM can always say enough is enough. In other words,
let's assume that a limit of 16 is probably appropriate but we can also
increase the limit and crash the VM if things become ridiculous.
Something like this:
static u32 max = 16;
int retry = 0;
spin_lock(&kvm->arch.seamcall_lock);
for (;;) {
args_in = *in;
ret = seamcall_ret(op, in);
if (++retry == 1) {
/* protected by the same seamcall_lock */
kvm->stat.retried_seamcalls++;
} else if (retry == READ_ONCE(max)) {
pr_warn("Exceeded %d retries for S-EPT operation\n", max);
if (KVM_BUG_ON(kvm, retry == 1024)) {
pr_err("Crashing due to lock contention in the TDX module\n");
break;
}
cmpxchg(&max, retry, retry * 2);
}
}
spin_unlock(&kvm->arch.seamcall_lock);
This way we can do some testing and figure out a useful limit.
For zero step detection, my reading is that it's TDH.VP.ENTER that
fails; not any of the MEM seamcalls. For that one to be resolved, it
should be enough to do take and release the mmu_lock back to back, which
ensures that all pending critical sections have completed (that is,
"write_lock(&kvm->mmu_lock); write_unlock(&kvm->mmu_lock);"). And then
loop. Adding a vCPU stat for that one is a good idea, too.
Paolo