On Tue, Nov 26, 2024 at 08:46:51AM +0800, Edgecombe, Rick P wrote: > On Thu, 2024-11-21 at 19:51 +0800, Yan Zhao wrote: > > ==proposal details== > > > > The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs > > together which are required for page installation and uninstallation. > > > > These SEAMCALLs can be divided into three groups: > > Group 1: tdh_mem_page_add(). > > The SEAMCALL is invoked only during TD build time and therefore > > KVM has ensured that no contention will occur. > > > > Proposal: (as in patch 1) > > Just return error when TDX_OPERAND_BUSY is found. > > > > Group 2: tdh_mem_sept_add(), tdh_mem_page_aug(). > > These two SEAMCALLs are invoked for page installation. > > They return TDX_OPERAND_BUSY when contending with tdh_vp_enter() > > (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(), > > tdg_mem_page_attr_rd/wr(). > > We did verify with TDX module folks that the TDX module could be changed to not > take the sept host priority lock for zero entries (that happen during the guest > operations). In that case, I think we shouldn't expect contention for > tdh_mem_sept_add() and tdh_mem_page_aug() from them? We still need it for > tdh_vp_enter() though. Ah, you are right. I previously incorrectly thought TDX module will avoid locking free entries for tdg_mem_page_accept() only. > > > > > > Proposal: (as in patch 1) > > - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY > > to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault(). > > > > - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as > > long as there are no pending signals/interrupts. > > > > The retry inside TDX aims to reduce the count of tdh_vp_enter() > > before resolving EPT violations in the local vCPU, thereby > > minimizing contentions with other vCPUs. However, it can't > > completely eliminate 0-step mitigation as it exits when there're > > pending signals/interrupts and does not (and cannot) remove the > > tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT. > > > > Resources SHARED users EXCLUSIVE users > > ------------------------------------------------------------ > > SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation) > > tdh_mem_page_aug > > ------------------------------------------------------------ > > SEPT entry tdh_mem_sept_add (Host lock) > > tdh_mem_page_aug (Host lock) > > tdg_mem_page_accept (Guest lock) > > tdg_mem_page_attr_rd (Guest lock) > > tdg_mem_page_attr_wr (Guest lock) > > > > Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove(). > > These three SEAMCALLs are invoked for page uninstallation, with > > KVM mmu_lock held for writing. > > > > Resources SHARED users EXCLUSIVE users > > ------------------------------------------------------------ > > TDCS epoch tdh_vp_enter tdh_mem_track > > ------------------------------------------------------------ > > SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation) > > tdh_mem_range_block > > ------------------------------------------------------------ > > SEPT entry tdh_mem_range_block (Host lock) > > tdh_mem_page_remove (Host lock) > > tdg_mem_page_accept (Guest lock) > > tdg_mem_page_attr_rd (Guest lock) > > tdg_mem_page_attr_wr (Guest lock) > > > > Proposal: (as in patch 2) > > - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only > > once. > > - During the retry, kick off all vCPUs and prevent any vCPU from > > entering to avoid potential contentions. > > > > This is because tdh_vp_enter() and TDCALLs are not protected by > > KVM mmu_lock, and remove_external_spte() in KVM must not fail. > > The solution for group 3 actually doesn't look too bad at all to me. At least > from code and complexity wise. It's pretty compact, doesn't add any locks, and > limited to the tdx.c code. Although, I didn't evaluate the implementation > correctness of tdx_no_vcpus_enter_start() and tdx_no_vcpus_enter_stop() yet. > > Were you able to test the fallback path at all? Can we think of any real > situations where it could be burdensome? Yes, I did some negative tests to fail block/track/remove to check if the tdx_no_vcpus_enter*() work. Even without those negative tests, it's not rare for tdh_mem_track() to return busy due to its contention with tdh_vp_enter(). Given that normally it's not frequent to find tdh_mem_range_block() or tdh_mem_page_remove() to return busy (if we reduce the frequency of zero-step mitigation) and that tdh_mem_track() will kick off all vCPUs later any way, I think it's acceptable to do the tdx_no_vcpus_enter*() stuffs in the page removal path. > One other thing to note I think, is that group 3 are part of no-fail operations. > The core KVM calling code doesn't have the understanding of a failure there. So > in this scheme of not avoiding contention we have to succeed before returning, > where group 1 and 2 can fail, so don't need the special fallback scheme. Yes, exactly.