On Thu, Feb 16, 2023 at 11:18 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Feb 16, 2023, Peter Gonda wrote: > > On Mon, Feb 13, 2023 at 1:07 PM Sagi Shahar <sagis@xxxxxxxxxx> wrote: > > > > > > TL;DR > > > Marking an SEV VM as dead after intra-host migration prevents cleanly tearing > > > down said VM. > > > > > > We are testing our POC code for TDX copyless migration and notice some > > > issues. We are currently using a similar scheme to the one used for > > > SEV where the VM is marked as dead after the migration is completed > > > which prevents any other IOCTLs from being triggered on the VM. > > > > > > From what we are seeing, there are at least 2 IOCTLs that VMM is > > > issuing on the source VM after the migration is completed. The first > > > one is KVM_IOEVENTFD for unwiring an eventfd used for the NVMe admin > > > queue during the NVMe device unplug sequence. The second IOCTL is > > > KVM_SET_USER_MEMORY_REGION for removing the memslots during VM > > > destruction. Failing any of these IOCTLs will cause the migration to > > > fail. > > Does the VMM _need_ to cleanly teardown the source VM? If so, why? > > > > I can see 3 options: > > > > > > 1) If we want to keep the vm_dead logic as is, this means changing to > > > VMM code in some pretty hacky way. We will need to distinguish between > > > regular VM shutdown to VM shutdown after migration. We will also need > > > to make absolutely sure that we don't leave any dangling data in the > > > kernel by skipping some of the cleanup stages. > > > > > > 2) If we want to remove the vm_dead logic we can simply not mark the > > > vm as dead after migration. It looks like it will just work but might > > > create special cases where IOCTLs can be called on a TD which isn't > > > valid anymore. From what I can tell, some of these code paths are > > > already protected by a check if hkid is assigned so it might not be a > > > big issue. Not sure how this will work for SEV but I'm guessing > > > there's a similar mechanism there as well. > > > > > > 3) We can also go half way and only block certain memory encryption > > > related IOCTLs if the VM got migrated. This will likely require more > > > changes when we try to push this ustream since it will require adding > > > a new field for vm_mem_enc_dead (or something similar) in addition to > > > the current vm_bugged and vm_dead. > > > > > > Personally, I don't want to go with option (1) since it sounds quite > > > risky to make these kind of changes without fully understanding all > > > the possible side effects. > > > > > > I prefer either option (2) or (3) but I don't know which one will be > > > more acceptable by the community. > > > > I agree option 2 or 3 seem preferable. Option two sounds good to me, I > > am not sure why we needed to disable all IOCLTs on the source VM after > > the migration. I was just taking feedback on the review. > > I don't like #2. For all intents and purposes, the source VM _is_ dead, or at > least zombified. It _was_ an SEV guest, but after migration it's no longer an > SEV guest, e.g. doesn't have a dedicated ASID, etc. But the CPUID state and a > pile of register state won't be coherent, especially on SEV-ES where KVM doesn't > have visibility into guest state. > > > We have the ASID similar to the HKID in SEV. I don't think the code > > paths are already protected like you mention TDX is but that seems > > like a simple enough fix. Or maybe it's better to introduce a new > > VM_MOVED like VM_BUGGED and VM_DEAD which allows most IOCTLs but just > > disables running vCPUs. > > I kinda like the idea of a VM_MOVED flag, but I'm a bit leary of it from a > a maintenance and ABI perspective. Definining and documenting what ioctls() > are/aren't allowed would get rather messy. The beauty of VM_DEAD is that it's > all or nothing. > > > What about option 4. Remove the VM_DEAD on migration and do nothing > > else. > > I'm strongly against doing nothing. It _might_ be safe from KVM's perspective, > but I would really prefer not to have to constantly think about whether or not a > given change is safe in the context of a zombified SEV guest. What if it was the responsibility of sev_vm_move_enc_context_from() (and I assume tdx_vm_move_enc_context_from) to put the VM and its vCPUs into a safe state with regard to KVM. That state can be completely broken for running the VM though as it would need to be for SEV-ES and TDX. > > > Userspace can easily make errors which cause the VM to be unusable. Does this > > error path really need support from KVM? > > As above, I'm concerned with KVM's safety, not the guest's. > > Depending on why the source VM needs to be cleaned up, one thought would be add > a dedicated ioctl(), e.g. KVM_DISMANTLE_VM, and make that the _only_ ioctl() that's > allowed to operate on a dead VM. The ioctl() would be defined as a best-effort > mechanism to teardown/free internal state, e.g. destroy KVM_PIO_BUS, KVM_MMIO_BUS, > and memslots, zap all SPTEs, etc...