On Wed, May 15, 2024 at 10:18:31PM +0200, Borislav Petkov wrote: > So if I were to design this, I'd do it this way: > > 0. guest gets hw poison injected > > 1. it runs memory_failure() and it kills the processes using the page. > > 2. page is marked poisoned on the host so no other guest gets it. > > That's it. No second accesses whatsoever. At least this is how it works > on baremetal. > > This hw poisoning emulation is just silly and unnecessary. We (QEMU) haven't yet consumed this.. but I think it makes sense to have such emulation, as it's slightly different from a real hwpoison. I think the important bit that's missing in this picture is migration, that the VM can migrate from one host to another, carrying that poisoned PFN. Let's assume we have two hosts: src and dst. Currently VM runs on src host. Before migration, there is a real PFN that is bad, MCE injected. When accesssed by either guest vcpu or host cpu / hypervisor, VM gets killed. This is so far the same to any process that has a bad page. However it's possible a VM got migrated _before_ that bad PFN accessed, in this case the VM is still legal to run, the hypervisor will not migrate that bad PFN data knowing that its data is invalid. What it does is it'll tell dst that "this guest PFN is bad, if guest access it let's crash it". Then what dst host needs is a way to describe "this guest PFN is bad": the easiest way is to describe "this VA of the process is bad", meanwhile there'll be no real page backing that VA anyway, and also no real poisoned pages. We want to poison a VA only. That's why an emulation is needed. Besides that we want to get exactly whatever we'll get for a real hwpoison, e.g. SIGBUS with the address encoded, then KVM work naturally with that just like a real MCE. One other thing we can do is to inject-poison to the VA together with the page backing it, but that'll pollute a PFN on dst host to be a real bad PFN and won't be able to be used by the dst OS anymore, so it's less optimal. Thanks, -- Peter Xu