On Thu, Jul 15, 2021, Dave Hansen wrote: > On 7/15/21 11:56 AM, Sean Christopherson wrote: > >>>> + /* Retry if another processor is modifying the RMP entry. */ > >>>> + do { > >>>> + /* Binutils version 2.36 supports the PSMASH mnemonic. */ > >>>> + asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF" > >>>> + : "=a"(ret) > >>>> + : "a"(spa) > >>>> + : "memory", "cc"); > >>>> + } while (ret == FAIL_INUSE); > >>> Should there be some retry limit here for safety? Or do we know that > >>> we'll never be stuck in this loop? Ditto for the loop in rmpupdate. > >> It's probably fine to just leave this. While you could *theoretically* > >> lose this race forever, it's unlikely to happen in practice. If it > >> does, you'll get an easy-to-understand softlockup backtrace which should > >> point here pretty quickly. > > But should failure here even be tolerated? The TDX cases spin on flows that are > > _not_ due to (direct) contenion, e.g. a pending interrupt while flushing the > > cache or lack of randomness when generating a key. In this case, there are two > > CPUs racing to modify the RMP entry, which implies that the final state of the > > RMP entry is not deterministic. > > I was envisioning that two different CPUs could try to smash two > *different* 4k physical pages, but collide since they share > a 2M page. > > But, in patch 33, this is called via: > > > + write_lock(&kvm->mmu_lock); > > + > > + switch (op) { > > + case SNP_PAGE_STATE_SHARED: > > + rc = snp_make_page_shared(vcpu, gpa, pfn, level); > ... > > Which should make collisions impossible. Did I miss another call-site? Ya, there's more, e.g. sev_snp_write_page_begin() and snp_handle_rmp_page_fault(), both of which run without holding mmu_lock. The PSMASH operation isn't too concerning, but the associated RMPUDATE is most definitely a concern, e.g. if two vCPUs are trying to access different variants of a page. It's ok if KVM's "response" in such a situation does weird things to the guest, but one of the two operations should "win", which I don't think is guaranteed if multiple RMP violations are racing. I'll circle back to this patch after I've gone through the KVM MMU changes.