On Mon, 2024-09-09 at 17:25 +0200, Paolo Bonzini wrote: > On 9/4/24 05:07, Rick Edgecombe wrote: > > +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in) > > +{ > > +#define SEAMCALL_RETRY_MAX 16 > > How is the 16 determined? Also, is the lock per-VM or global? The lock being considered here is per-TD, but TDX_OPERAND_BUSY in general can be for other locks. I'm not sure where the 16 came from, maybe Yuan or Isaku can share the history. In any case, there seems to be some problems with this patch or justification. Regarding the zero-step mitigation, the TDX Module has a mitigation for an attack where a malicious VMM causes repeated private EPT violations for the same GPA. When this happens TDH.VP.ENTER will fail to enter the guest. Regardless of zero-step detection, these SEPT related SEAMCALLs will exit with the checked error code if they contend the mentioned lock. If there was some other (non- zero-step related) contention for this lock and KVM tries to re-enter the TD too many times without resolving an EPT violation, it might inadvertently trigger the zero-step mitigation. I *think* this patch is trying to say not to worry about this case, and do a simple retry loop instead to handle the contention. But why 16 retries would be sufficient, I can't find a reason for. Getting this required retry logic right is important because some failures (TDH.MEM.RANGE.BLOCK) can lead to KVM_BUG_ON()s. Per the docs, in general the VMM is supposed to retry SEAMCALLs that return TDX_OPERAND_BUSY. I think we need to revisit the general question of which SEAMCALLs we should be retrying and how many times/how long. The other consideration is that KVM already has per-VM locking, that would prevent contention for some of the locks. So depending on internal details KVM may not need to do any retries in some cases.