On Fri, May 10, 2024, Michael Roth wrote: > On Fri, May 10, 2024 at 03:50:26PM +0200, Paolo Bonzini wrote: > > On Fri, May 10, 2024 at 3:47 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > > + * Since software-protected VMs don't have a notion of a shared vs. > > > > + * private that's separate from what KVM is tracking, the above > > > > + * KVM_EXIT_MEMORY_FAULT condition wouldn't occur, so avoid the > > > > + * special handling for that case for now. > > > > > > Very technically, it can occur if userspace _just_ modified the attributes. And > > > as I've said multiple times, at least for now, I want to avoid special casing > > > SW-protected VMs unless it is *absolutely* necessary, because their sole purpose > > > is to allow testing flows that are impossible to excercise without SNP/TDX hardware. > > > > Yep, it is not like they have to be optimized. > > Ok, I thought there were maybe some future plans to use sw-protected VMs > to get some added protections from userspace. But even then there'd > probably still be extra considerations for how to handle access tracking > so white-listing them probably isn't right anyway. > > I was also partly tempted to take this route because it would cover this > TDX patch as well: > > https://lore.kernel.org/lkml/91c797997b57056224571e22362321a23947172f.1705965635.git.isaku.yamahata@xxxxxxxxx/ Hmm, I'm pretty sure that patch is trying to fix the exact same issue you are fixing, just in a less precise way. S-EPT entries only support RWX=0 and RWX=111b, i.e. it should be impossible to have a write-fault to a present S-EPT entry. And if TDX is running afoul of this code: if (!fault->present) return !kvm_ad_enabled(); then KVM should do the sane thing and require A/D support be enabled for TDX. And if it's something else entirely, that changelog has some explaining to do. > and avoid any weirdness about checking kvm_mem_is_private() without > checking mmu_invalidate_seq, but I think those cases all end up > resolving themselves eventually and added some comments around that. Yep, checking state that is protected by mmu_invalidate_seq outside of mmu_lock is definitely allowed, e.g. the entire fast page fault path operates outside of mmu_lock and thus outside of mmu_invalidate_seq's purview. It's a-ok because the SPTE are done with an atomic CMPXCHG, and so KVM only needs to ensure its page tables aren't outright _freed_. If the zap triggered by the attributes change "wins", then the fast #PF path will fail the CMPXCHG and be an expensive NOP. If the fast #PF wins, the zap will pave over the fast #PF fix, and the IPI+flush that is needed for all zaps, to ensure vCPUs don't have stale references, does the rest. And if there's an attributes race that causes the fast #PF to bail early, the vCPU will see the correct state on the next page fault.