Re: Issue with "KVM: SEV: Add support for SEV intra host migration"

Peter Gonda <pgonda@xxxxxxxxxx> · Thu, 16 Feb 2023 11:40:24 -0700

On Thu, Feb 16, 2023 at 11:18 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Feb 16, 2023, Peter Gonda wrote:
> > On Mon, Feb 13, 2023 at 1:07 PM Sagi Shahar <sagis@xxxxxxxxxx> wrote:
> > >
> > > TL;DR
> > > Marking an SEV VM as dead after intra-host migration prevents cleanly tearing
> > > down said VM.
> > >
> > > We are testing our POC code for TDX copyless migration and notice some
> > > issues. We are currently using a similar scheme to the one used for
> > > SEV where the VM is marked as dead after the migration is completed
> > > which prevents any other IOCTLs from being triggered on the VM.
> > >
> > > From what we are seeing, there are at least 2 IOCTLs that VMM is
> > > issuing on the source VM after the migration is completed. The first
> > > one is KVM_IOEVENTFD for unwiring an eventfd used for the NVMe admin
> > > queue during the NVMe device unplug sequence. The second IOCTL is
> > > KVM_SET_USER_MEMORY_REGION for removing the memslots during VM
> > > destruction. Failing any of these IOCTLs will cause the migration to
> > > fail.
>
> Does the VMM _need_ to cleanly teardown the source VM?  If so, why?
>
> > > I can see 3 options:
> > >
> > > 1) If we want to keep the vm_dead logic as is, this means changing to
> > > VMM code in some pretty hacky way. We will need to distinguish between
> > > regular VM shutdown to VM shutdown after migration. We will also need
> > > to make absolutely sure that we don't leave any dangling data in the
> > > kernel by skipping some of the cleanup stages.
> > >
> > > 2) If we want to remove the vm_dead logic we can simply not mark the
> > > vm as dead after migration. It looks like it will just work but might
> > > create special cases where IOCTLs can be called on a TD which isn't
> > > valid anymore. From what I can tell, some of these code paths are
> > > already  protected by a check if hkid is assigned so it might not be a
> > > big issue. Not sure how this will work for SEV but I'm guessing
> > > there's a similar mechanism there as well.
> > >
> > > 3) We can also go half way and only block certain memory encryption
> > > related IOCTLs if the VM got migrated. This will likely require more
> > > changes when we try to push this ustream since it will require adding
> > > a new field for vm_mem_enc_dead (or something similar) in addition to
> > > the current vm_bugged and vm_dead.
> > >
> > > Personally, I don't want to go with option (1) since it sounds quite
> > > risky to make these kind of changes without fully understanding all
> > > the possible side effects.
> > >
> > > I prefer either option (2) or (3) but I don't know which one will be
> > > more acceptable by the community.
> >
> > I agree option 2 or 3 seem preferable. Option two sounds good to me, I
> > am not sure why we needed to disable all IOCLTs on the source VM after
> > the migration. I was just taking feedback on the review.
>
> I don't like #2.  For all intents and purposes, the source VM _is_ dead, or at
> least zombified.  It _was_ an SEV guest, but after migration it's no longer an
> SEV guest, e.g. doesn't have a dedicated ASID, etc.  But the CPUID state and a
> pile of register state won't be coherent, especially on SEV-ES where KVM doesn't
> have visibility into guest state.
>
> > We have the ASID similar to the HKID in SEV. I don't think the code
> > paths are already protected like you mention TDX is but that seems
> > like a simple enough fix. Or maybe it's better to introduce a new
> > VM_MOVED like VM_BUGGED and VM_DEAD which allows most IOCTLs but just
> > disables running vCPUs.
>
> I kinda like the idea of a VM_MOVED flag, but I'm a bit leary of it from a
> a maintenance and ABI perspective.  Definining and documenting what ioctls()
> are/aren't allowed would get rather messy.  The beauty of VM_DEAD is that it's
> all or nothing.
>
> > What about option 4. Remove the VM_DEAD on migration and do nothing
> > else.
>
> I'm strongly against doing nothing.  It _might_ be safe from KVM's perspective,
> but I would really prefer not to have to constantly think about whether or not a
> given change is safe in the context of a zombified SEV guest.

What if it was the responsibility of sev_vm_move_enc_context_from()
(and I assume tdx_vm_move_enc_context_from) to put the VM and its
vCPUs into a safe state with regard to KVM. That state can be
completely broken for running the VM though as it would need to be for
SEV-ES and TDX.

>
> > Userspace can easily make errors which cause the VM to be unusable. Does this
> > error path really need support from KVM?
>
> As above, I'm concerned with KVM's safety, not the guest's.
>
> Depending on why the source VM needs to be cleaned up, one thought would be add
> a dedicated ioctl(), e.g. KVM_DISMANTLE_VM, and make that the _only_ ioctl() that's
> allowed to operate on a dead VM.  The ioctl() would be defined as a best-effort
> mechanism to teardown/free internal state, e.g. destroy KVM_PIO_BUS, KVM_MMIO_BUS,
> and memslots, zap all SPTEs, etc...