On Mon, Feb 13, 2023 at 1:07 PM Sagi Shahar <sagis@xxxxxxxxxx> wrote: > > TL;DR > Marking an SEV VM as dead after intra-host migration prevents cleanly tearing > down said VM. > > We are testing our POC code for TDX copyless migration and notice some > issues. We are currently using a similar scheme to the one used for > SEV where the VM is marked as dead after the migration is completed > which prevents any other IOCTLs from being triggered on the VM. > > From what we are seeing, there are at least 2 IOCTLs that VMM is > issuing on the source VM after the migration is completed. The first > one is KVM_IOEVENTFD for unwiring an eventfd used for the NVMe admin > queue during the NVMe device unplug sequence. The second IOCTL is > KVM_SET_USER_MEMORY_REGION for removing the memslots during VM > destruction. Failing any of these IOCTLs will cause the migration to > fail. > > I ran 2 simple experiments to test the behavior of these IOCTLs. > > First experiment was removing the code which marks the VM as dead. > After a small fix to the TDX migration logic it looks like both IOCTLs > can succeed even after the VM got migrated. > > Second experiment was to always return success if the VM is marked as > dead. This simulates the case where these IOCTLs never get called from > VMM after the migration. > > In both cases I'm seeing the same behavior in the overall migration > process. I'm getting a cgroup related error where it fails to delete > the CGROUP_CPU file but my guess is that this is not related to the > IOCTL issue. > > I can see 3 options: > > 1) If we want to keep the vm_dead logic as is, this means changing to > VMM code in some pretty hacky way. We will need to distinguish between > regular VM shutdown to VM shutdown after migration. We will also need > to make absolutely sure that we don't leave any dangling data in the > kernel by skipping some of the cleanup stages. > > 2) If we want to remove the vm_dead logic we can simply not mark the > vm as dead after migration. It looks like it will just work but might > create special cases where IOCTLs can be called on a TD which isn't > valid anymore. From what I can tell, some of these code paths are > already protected by a check if hkid is assigned so it might not be a > big issue. Not sure how this will work for SEV but I'm guessing > there's a similar mechanism there as well. > > 3) We can also go half way and only block certain memory encryption > related IOCTLs if the VM got migrated. This will likely require more > changes when we try to push this ustream since it will require adding > a new field for vm_mem_enc_dead (or something similar) in addition to > the current vm_bugged and vm_dead. > > Personally, I don't want to go with option (1) since it sounds quite > risky to make these kind of changes without fully understanding all > the possible side effects. > > I prefer either option (2) or (3) but I don't know which one will be > more acceptable by the community. I agree option 2 or 3 seem preferable. Option two sounds good to me, I am not sure why we needed to disable all IOCLTs on the source VM after the migration. I was just taking feedback on the review. We have the ASID similar to the HKID in SEV. I don't think the code paths are already protected like you mention TDX is but that seems like a simple enough fix. Or maybe it's better to introduce a new VM_MOVED like VM_BUGGED and VM_DEAD which allows most IOCTLs but just disables running vCPUs. What about option 4. Remove the VM_DEAD on migration and do nothing else. Userspace can easily make errors which cause the VM to be unusable. Does this error path really need support from KVM? I don't think it would violate the guests' security properties with SEV for SEV-ES. With SEV we can already use the mirror VM functionality to add more vCPUs, and with SEV-ES after the migration the source vCPUs are unusable because they lack VMSAs that are part of the guest from launch time.