On Wed, May 15, 2024, Rick P Edgecombe wrote: > On Wed, 2024-05-15 at 13:05 -0700, Sean Christopherson wrote: > > On Wed, May 15, 2024, Rick P Edgecombe wrote: > > > So rather then try to optimize zapping more someday and hit similar > > > issues, let userspace decide how it wants it to be done. I'm not sure of > > > the actual performance tradeoffs here, to be clear. > > > > ...unless someone is able to root cause the VFIO regression, we don't have > > the luxury of letting userspace give KVM a hint as to whether it might be > > better to do a precise zap versus a nuke-and-pave. > > Pedantry... I think it's not a regression if something requires a new flag. It > is still a bug though. Heh, pedantry denied. I was speaking in the past tense about the VFIO failure, which was a regression as I changed KVM behavior without adding a flag. > The thing I worry about on the bug is whether it might have been due to a guest > having access to page it shouldn't have. In which case we can't give the user > the opportunity to create it. > > I didn't gather there was any proof of this. Did you have any hunch either way? I doubt the guest was able to access memory it shouldn't have been able to access. But that's a moot point, as the bigger problem is that, because we have no idea what's at fault, KVM can't make any guarantees about the safety of such a flag. TDX is a special case where we don't have a better option (we do have other options, they're just horrible). In other words, the choice is essentially to either: (a) cross our fingers and hope that the problem is limited to shared memory with QEMU+VFIO, i.e. and doesn't affect TDX private memory. or (b) don't merge TDX until the original regression is fully resolved. FWIW, I would love to root cause and fix the failure, but I don't know how feasible that is at this point. > > And more importantly, it would be a _hint_, not the hard requirement that TDX > > needs. > > > > > That said, a per-vm know is easier for TDX purposes. > > If we don't want it to be a mandate from userspace, then we need to do some per- > vm checking in TDX's case anyway. In which case we might as well go with the > per-vm option for TDX. > > You had said up the thread, why not opt all non-normal VMs into the new > behavior. It will work great for TDX. But why do SEV and others want this > automatically? Because I want flexibility in KVM, i.e. I want to take the opportunity to try and break away from KVM's godawful ABI. It might be a pipe dream, as keying off the VM type obviously has similar risks to giving userspace a memslot flag. The one sliver of hope is that the VM types really are quite new (though less so for SEV and SEV-ES), whereas a memslot flag would be easily applied to existing VMs.