On Mon, Sep 09, 2024, Sean Christopherson wrote: > On Mon, Sep 09, 2024, Rick P Edgecombe wrote: > > On Mon, 2024-09-09 at 17:25 +0200, Paolo Bonzini wrote: > > > On 9/4/24 05:07, Rick Edgecombe wrote: > > > > +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in) > > > > +{ > > > > +#define SEAMCALL_RETRY_MAX 16 > > > > > > How is the 16 determined? Also, is the lock per-VM or global? > > > > The lock being considered here is per-TD, but TDX_OPERAND_BUSY in general can be > > for other locks. I'm not sure where the 16 came from, maybe Yuan or Isaku can > > share the history. In any case, there seems to be some problems with this patch > > or justification. > > > > Regarding the zero-step mitigation, the TDX Module has a mitigation for an > > attack where a malicious VMM causes repeated private EPT violations for the same > > GPA. When this happens TDH.VP.ENTER will fail to enter the guest. Regardless of > > zero-step detection, these SEPT related SEAMCALLs will exit with the checked > > error code if they contend the mentioned lock. If there was some other (non- > > zero-step related) contention for this lock and KVM tries to re-enter the TD too > > many times without resolving an EPT violation, it might inadvertently trigger > > the zero-step mitigation. I *think* this patch is trying to say not to worry > > about this case, and do a simple retry loop instead to handle the contention. > > > > But why 16 retries would be sufficient, I can't find a reason for. Getting this > > required retry logic right is important because some failures > > (TDH.MEM.RANGE.BLOCK) can lead to KVM_BUG_ON()s. > > I (somewhat indirectly) raised this as an issue in v11, and at a (very quick) > glance, nothing has changed to alleviate my concerns. Gah, went out of my way to find the thread and then forgot to post the link: https://lore.kernel.org/all/Y8m34OEVBfL7Q4Ns@xxxxxxxxxx > In general, I am _very_ opposed to blindly retrying an SEPT SEAMCALL, ever. For > its operations, I'm pretty sure the only sane approach is for KVM to ensure there > will be no contention. And if the TDX module's single-step protection spuriously > kicks in, KVM exits to userspace. If the TDX module can't/doesn't/won't communicate > that it's mitigating single-step, e.g. so that KVM can forward the information > to userspace, then that's a TDX module problem to solve. > > > Per the docs, in general the VMM is supposed to retry SEAMCALLs that return > > TDX_OPERAND_BUSY. > > IMO, that's terrible advice. SGX has similar behavior, where the xucode "module" > signals #GP if there's a conflict. #GP is obviously far, far worse as it lacks > the precision that would help software understand exactly what went wrong, but I > think one of the better decisions we made with the SGX driver was to have a > "zero tolerance" policy where the driver would _never_ retry due to a potential > resource conflict, i.e. that any conflict in the module would be treated as a > kernel bug. > > > I think we need to revisit the general question of which > > SEAMCALLs we should be retrying and how many times/how long. The other > > consideration is that KVM already has per-VM locking, that would prevent > > contention for some of the locks. So depending on internal details KVM may not > > need to do any retries in some cases. > > Yes, and if KVM can't avoid conflict/retry, then before we go any further, I want > to know exactly why that is the case.