On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote: > On Mon, May 20, 2024, Michael Roth wrote: > > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote: > > > On Mon, May 20, 2024, Michael Roth wrote: > > > > But there is a possibility that the guest will attempt access the response > > > > PFN before/during that reporting and spin on an #NPF instead though. So > > > > maybe the safer more repeatable approach is to handle the error directly > > > > from KVM and propagate it to userspace. > > > > > > I was thinking more along the lines of KVM marking the VM as dead/bugged. > > > > In practice userspace will get an unhandled exit and kill the vcpu/guest, > > but we could additionally flag the guest as dead. > > Honest question, does it make sense from KVM to make the VM unusable? E.g. is > it feasible for userspace to keep running the VM? Does the page that's in a bad > state present any danger to the host? If the reclaim fails (which it shouldn't), then KVM has a unique situation where a non-gmem guest page is in a state. In theory, if the guest/userspace could somehow induce a reclaim failure, then can they potentially trick the host into trying to access that same page as a shared page and induce a host RMP #PF. So it does seem like a good idea to force the guest to stop executing. Then once the guest is fully destroyed the bad page will stay leaked so it won't affect subsequent activities. > > > Is there a existing mechanism for this? > > kvm_vm_dead() Nice, that would do the trick. I'll modify the logic to also call that after a reclaim failure. Thanks, Mike