Re: Issue with "KVM: SEV: Add support for SEV intra host migration"

Peter Gonda <pgonda@xxxxxxxxxx> · Thu, 16 Feb 2023 07:40:13 -0700

On Mon, Feb 13, 2023 at 1:07 PM Sagi Shahar <sagis@xxxxxxxxxx> wrote:
>
> TL;DR
> Marking an SEV VM as dead after intra-host migration prevents cleanly tearing
> down said VM.
>
> We are testing our POC code for TDX copyless migration and notice some
> issues. We are currently using a similar scheme to the one used for
> SEV where the VM is marked as dead after the migration is completed
> which prevents any other IOCTLs from being triggered on the VM.
>
> From what we are seeing, there are at least 2 IOCTLs that VMM is
> issuing on the source VM after the migration is completed. The first
> one is KVM_IOEVENTFD for unwiring an eventfd used for the NVMe admin
> queue during the NVMe device unplug sequence. The second IOCTL is
> KVM_SET_USER_MEMORY_REGION for removing the memslots during VM
> destruction. Failing any of these IOCTLs will cause the migration to
> fail.
>
> I ran 2 simple experiments to test the behavior of these IOCTLs.
>
> First experiment was removing the code which marks the VM as dead.
> After a small fix to the TDX migration logic it looks like both IOCTLs
> can succeed even after the VM got migrated.
>
> Second experiment was to always return success if the VM is marked as
> dead. This simulates the case where these IOCTLs never get called from
> VMM after the migration.
>
> In both cases I'm seeing the same behavior in the overall migration
> process. I'm getting a cgroup related error where it fails to delete
> the CGROUP_CPU file but my guess is that this is not related to the
> IOCTL issue.
>
> I can see 3 options:
>
> 1) If we want to keep the vm_dead logic as is, this means changing to
> VMM code in some pretty hacky way. We will need to distinguish between
> regular VM shutdown to VM shutdown after migration. We will also need
> to make absolutely sure that we don't leave any dangling data in the
> kernel by skipping some of the cleanup stages.
>
> 2) If we want to remove the vm_dead logic we can simply not mark the
> vm as dead after migration. It looks like it will just work but might
> create special cases where IOCTLs can be called on a TD which isn't
> valid anymore. From what I can tell, some of these code paths are
> already  protected by a check if hkid is assigned so it might not be a
> big issue. Not sure how this will work for SEV but I'm guessing
> there's a similar mechanism there as well.
>
> 3) We can also go half way and only block certain memory encryption
> related IOCTLs if the VM got migrated. This will likely require more
> changes when we try to push this ustream since it will require adding
> a new field for vm_mem_enc_dead (or something similar) in addition to
> the current vm_bugged and vm_dead.
>
> Personally, I don't want to go with option (1) since it sounds quite
> risky to make these kind of changes without fully understanding all
> the possible side effects.
>
> I prefer either option (2) or (3) but I don't know which one will be
> more acceptable by the community.

I agree option 2 or 3 seem preferable. Option two sounds good to me, I
am not sure why we needed to disable all IOCLTs on the source VM after
the migration. I was just taking feedback on the review.

We have the ASID similar to the HKID in SEV. I don't think the code
paths are already protected like you mention TDX is but that seems
like a simple enough fix. Or maybe it's better to introduce a new
VM_MOVED like VM_BUGGED and VM_DEAD which allows most IOCTLs but just
disables running vCPUs.

What about option 4. Remove the VM_DEAD on migration and do nothing
else. Userspace can easily make errors which cause the VM to be
unusable. Does this error path really need support from KVM? I don't
think it would violate the guests' security properties with SEV for
SEV-ES. With SEV we can already use the mirror VM functionality to add
more vCPUs, and with SEV-ES after the migration the source vCPUs are
unusable because they lack VMSAs that are part of the guest from
launch time.