On Thu, 2024-11-21 at 19:51 +0800, Yan Zhao wrote: > ==proposal details== > > The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs > together which are required for page installation and uninstallation. > > These SEAMCALLs can be divided into three groups: > Group 1: tdh_mem_page_add(). > The SEAMCALL is invoked only during TD build time and therefore > KVM has ensured that no contention will occur. > > Proposal: (as in patch 1) > Just return error when TDX_OPERAND_BUSY is found. > > Group 2: tdh_mem_sept_add(), tdh_mem_page_aug(). > These two SEAMCALLs are invoked for page installation. > They return TDX_OPERAND_BUSY when contending with tdh_vp_enter() > (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(), > tdg_mem_page_attr_rd/wr(). We did verify with TDX module folks that the TDX module could be changed to not take the sept host priority lock for zero entries (that happen during the guest operations). In that case, I think we shouldn't expect contention for tdh_mem_sept_add() and tdh_mem_page_aug() from them? We still need it for tdh_vp_enter() though. > > Proposal: (as in patch 1) > - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY > to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault(). > > - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as > long as there are no pending signals/interrupts. > > The retry inside TDX aims to reduce the count of tdh_vp_enter() > before resolving EPT violations in the local vCPU, thereby > minimizing contentions with other vCPUs. However, it can't > completely eliminate 0-step mitigation as it exits when there're > pending signals/interrupts and does not (and cannot) remove the > tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT. > > Resources SHARED users EXCLUSIVE users > ------------------------------------------------------------ > SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) > tdh_mem_page_aug > ------------------------------------------------------------ > SEPT entry tdh_mem_sept_add (Host lock) > tdh_mem_page_aug (Host lock) > tdg_mem_page_accept (Guest lock) > tdg_mem_page_attr_rd (Guest lock) > tdg_mem_page_attr_wr (Guest lock) > > Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove(). > These three SEAMCALLs are invoked for page uninstallation, with > KVM mmu_lock held for writing. > > Resources SHARED users EXCLUSIVE users > ------------------------------------------------------------ > TDCS epoch tdh_vp_enter tdh_mem_track > ------------------------------------------------------------ > SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation) > tdh_mem_range_block > ------------------------------------------------------------ > SEPT entry tdh_mem_range_block (Host lock) > tdh_mem_page_remove (Host lock) > tdg_mem_page_accept (Guest lock) > tdg_mem_page_attr_rd (Guest lock) > tdg_mem_page_attr_wr (Guest lock) > > Proposal: (as in patch 2) > - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only > once. > - During the retry, kick off all vCPUs and prevent any vCPU from > entering to avoid potential contentions. > > This is because tdh_vp_enter() and TDCALLs are not protected by > KVM mmu_lock, and remove_external_spte() in KVM must not fail. The solution for group 3 actually doesn't look too bad at all to me. At least from code and complexity wise. It's pretty compact, doesn't add any locks, and limited to the tdx.c code. Although, I didn't evaluate the implementation correctness of tdx_no_vcpus_enter_start() and tdx_no_vcpus_enter_stop() yet. Were you able to test the fallback path at all? Can we think of any real situations where it could be burdensome? One other thing to note I think, is that group 3 are part of no-fail operations. The core KVM calling code doesn't have the understanding of a failure there. So in this scheme of not avoiding contention we have to succeed before returning, where group 1 and 2 can fail, so don't need the special fallback scheme.