On Tue, Sep 10, 2024, Rick P Edgecombe wrote: > On Mon, 2024-09-09 at 16:58 -0700, Sean Christopherson wrote: > > On Mon, Sep 09, 2024, Rick P Edgecombe wrote: > > > On Mon, 2024-09-09 at 14:23 -0700, Sean Christopherson wrote: > > > > > In general, I am _very_ opposed to blindly retrying an SEPT SEAMCALL, > > > > > ever. For its operations, I'm pretty sure the only sane approach is for > > > > > KVM to ensure there will be no contention. And if the TDX module's > > > > > single-step protection spuriously kicks in, KVM exits to userspace. If > > > > > the TDX module can't/doesn't/won't communicate that it's mitigating > > > > > single-step, e.g. so that KVM can forward the information to userspace, > > > > > then that's a TDX module problem to solve. > > > > > > > > > > > Per the docs, in general the VMM is supposed to retry SEAMCALLs that > > > > > > return TDX_OPERAND_BUSY. > > > > > > > > > > IMO, that's terrible advice. SGX has similar behavior, where the xucode > > > > > "module" signals #GP if there's a conflict. #GP is obviously far, far > > > > > worse as it lacks the precision that would help software understand > > > > > exactly what went wrong, but I think one of the better decisions we made > > > > > with the SGX driver was to have a "zero tolerance" policy where the > > > > > driver would _never_ retry due to a potential resource conflict, i.e. > > > > > that any conflict in the module would be treated as a kernel bug. > > > > > > Thanks for the analysis. The direction seems reasonable to me for this lock > > > in > > > particular. We need to do some analysis on how much the existing mmu_lock > > > can > > > protects us. > > > > I would operate under the assumption that it provides SEPT no meaningful > > protection. > > I think I would even go so far as to say that it is a _requirement_ that > > mmu_lock > > does NOT provide the ordering required by SEPT, because I do not want to take > > on > > any risk (due to SEPT constraints) that would limit KVM's ability to do things > > while holding mmu_lock for read. > > Ok. Not sure, but I think you are saying not to add any extra acquisitions of > mmu_lock. No new write_lock. If read_lock is truly needed, no worries. But SEPT needing a write_lock is likely a hard "no", as the TDP MMU's locking model depends heavily on vCPUs being readers. E.g. the TDP MMU has _much_ coarser granularity than core MM, but it works because almost everything is done while holding mmu_lock for read. > Until we answer some of the questions (i.e. HOST_PRIORITY exposure), it's hard > to say. We need to check some stuff on our end. Ya, agreed.