Re: [PATCH v10 12/16] KVM: x86: Introduce new KVM_FEATURE_SEV_LIVE_MIGRATION feature & Custom MSR.

Ashish Kalra <ashish.kalra@xxxxxxx> · Sat, 6 Feb 2021 04:49:16 +0000

Hello Steve,

Let me first answer those queries which i can do immediately ...

On Fri, Feb 05, 2021 at 06:54:21PM -0800, Steve Rutherford wrote:
> On Thu, Feb 4, 2021 at 7:08 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> >
> > Hello Steve,
> >
> > On Thu, Feb 04, 2021 at 04:56:35PM -0800, Steve Rutherford wrote:
> > > On Wed, Feb 3, 2021 at 4:39 PM Ashish Kalra <Ashish.Kalra@xxxxxxx> wrote:
> > > >
> > > > From: Ashish Kalra <ashish.kalra@xxxxxxx>
> > > >
> > > > Add new KVM_FEATURE_SEV_LIVE_MIGRATION feature for guest to check
> > > > for host-side support for SEV live migration. Also add a new custom
> > > > MSR_KVM_SEV_LIVE_MIGRATION for guest to enable the SEV live migration
> > > > feature.
> > > >
> > > > Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx>
> > > > ---
> > > >  Documentation/virt/kvm/cpuid.rst     |  5 +++++
> > > >  Documentation/virt/kvm/msr.rst       | 12 ++++++++++++
> > > >  arch/x86/include/uapi/asm/kvm_para.h |  4 ++++
> > > >  arch/x86/kvm/svm/sev.c               | 13 +++++++++++++
> > > >  arch/x86/kvm/svm/svm.c               | 16 ++++++++++++++++
> > > >  arch/x86/kvm/svm/svm.h               |  2 ++
> > > >  6 files changed, 52 insertions(+)
> > > >
> > > > diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> > > > index cf62162d4be2..0bdb6cdb12d3 100644
> > > > --- a/Documentation/virt/kvm/cpuid.rst
> > > > +++ b/Documentation/virt/kvm/cpuid.rst
> > > > @@ -96,6 +96,11 @@ KVM_FEATURE_MSI_EXT_DEST_ID        15          guest checks this feature bit
> > > >                                                 before using extended destination
> > > >                                                 ID bits in MSI address bits 11-5.
> > > >
> > > > +KVM_FEATURE_SEV_LIVE_MIGRATION     16          guest checks this feature bit before
> > > > +                                               using the page encryption state
> > > > +                                               hypercall to notify the page state
> > > > +                                               change
> > > > +
> > > >  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24          host will warn if no guest-side
> > > >                                                 per-cpu warps are expected in
> > > >                                                 kvmclock
> > > > diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> > > > index e37a14c323d2..020245d16087 100644
> > > > --- a/Documentation/virt/kvm/msr.rst
> > > > +++ b/Documentation/virt/kvm/msr.rst
> > > > @@ -376,3 +376,15 @@ data:
> > > >         write '1' to bit 0 of the MSR, this causes the host to re-scan its queue
> > > >         and check if there are more notifications pending. The MSR is available
> > > >         if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
> > > > +
> > > > +MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +        0x4b564d08
> > > > +
> > > > +       Control SEV Live Migration features.
> > > > +
> > > > +data:
> > > > +        Bit 0 enables (1) or disables (0) host-side SEV Live Migration feature,
> > > > +        in other words, this is guest->host communication that it's properly
> > > > +        handling the shared pages list.
> > > > +
> > > > +        All other bits are reserved.
> > > > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> > > > index 950afebfba88..f6bfa138874f 100644
> > > > --- a/arch/x86/include/uapi/asm/kvm_para.h
> > > > +++ b/arch/x86/include/uapi/asm/kvm_para.h
> > > > @@ -33,6 +33,7 @@
> > > >  #define KVM_FEATURE_PV_SCHED_YIELD     13
> > > >  #define KVM_FEATURE_ASYNC_PF_INT       14
> > > >  #define KVM_FEATURE_MSI_EXT_DEST_ID    15
> > > > +#define KVM_FEATURE_SEV_LIVE_MIGRATION 16
> > > >
> > > >  #define KVM_HINTS_REALTIME      0
> > > >
> > > > @@ -54,6 +55,7 @@
> > > >  #define MSR_KVM_POLL_CONTROL   0x4b564d05
> > > >  #define MSR_KVM_ASYNC_PF_INT   0x4b564d06
> > > >  #define MSR_KVM_ASYNC_PF_ACK   0x4b564d07
> > > > +#define MSR_KVM_SEV_LIVE_MIGRATION     0x4b564d08
> > > >
> > > >  struct kvm_steal_time {
> > > >         __u64 steal;
> > > > @@ -136,4 +138,6 @@ struct kvm_vcpu_pv_apf_data {
> > > >  #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
> > > >  #define KVM_PV_EOI_DISABLED 0x0
> > > >
> > > > +#define KVM_SEV_LIVE_MIGRATION_ENABLED BIT_ULL(0)
> > > > +
> > > >  #endif /* _UAPI_ASM_X86_KVM_PARA_H */
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index b0d324aed515..93f42b3d3e33 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -1627,6 +1627,16 @@ int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
> > > >         return ret;
> > > >  }
> > > >
> > > > +void sev_update_migration_flags(struct kvm *kvm, u64 data)
> > > > +{
> > > > +       struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> > > > +
> > > > +       if (!sev_guest(kvm))
> > > > +               return;
> > >
> > > This should assert that userspace wanted the guest to be able to make
> > > these calls (see more below).
> > >
> > > >
> > > > +
> > > > +       sev->live_migration_enabled = !!(data & KVM_SEV_LIVE_MIGRATION_ENABLED);
> > > > +}
> > > > +
> > > >  int svm_get_shared_pages_list(struct kvm *kvm,
> > > >                               struct kvm_shared_pages_list *list)
> > > >  {
> > > > @@ -1639,6 +1649,9 @@ int svm_get_shared_pages_list(struct kvm *kvm,
> > > >         if (!sev_guest(kvm))
> > > >                 return -ENOTTY;
> > > >
> > > > +       if (!sev->live_migration_enabled)
> > > > +               return -EINVAL;
> 
> This is currently under guest control, so I'm not certain this is
> helpful. If I called this with otherwise valid parameters, and got
> back -EINVAL, I would probably think the bug is on my end. But it
> could be on the guest's end! I would probably drop this, but you could
> have KVM return an empty list of regions when this happens.

You will get -EINVAL till the guest kernel has booted to a specific
point, because live migration is enabled when guest kernel has checked
for host support for live migration and also OVMF/UEFI support for
live migration, as per this guest kernel code below :

arch/x86/kernel/kvm.c:

kvm_init_platform() -> check_kvm_sev_migration()
..

arch/x86/mm/mem_encrypt.c:

void __init check_kvm_sev_migration(void)
{
        if (sev_active() &&
            kvm_para_has_feature(KVM_FEATURE_SEV_LIVE_MIGRATION)) {
                unsigned long nr_pages;

                pr_info("KVM enable live migration\n");
                sev_live_migration_enabled = true;

                /*
                 * Ensure that _bss_decrypted section is marked as decrypted in the
                 * shared pages list.
                 */
                nr_pages = DIV_ROUND_UP(__end_bss_decrypted - __start_bss_decrypted,
                                        PAGE_SIZE);
                early_set_mem_enc_dec_hypercall((unsigned long)__start_bss_decrypted,
                                                nr_pages, 0);

                /*
                 * If not booted using EFI, enable Live migration support.
                 */
                if (!efi_enabled(EFI_BOOT))
                        wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION,
                               KVM_SEV_LIVE_MIGRATION_ENABLED);
	} else {
                        pr_info("KVM enable live migration feature unsupported\n");
                }       

Later, setup_kvm_sev_migration() invoked via a late initcall, checks for
live migration supported(setup above) and UEFI/OVMF support for live
migration and then enables live migration on the host via the wrmrsl() :

static int __init setup_kvm_sev_migration(void)
{       
        efi_char16_t efi_sev_live_migration_enabled[] = L"SevLiveMigrationEnabled";
        efi_guid_t efi_variable_guid = MEM_ENCRYPT_GUID;
        efi_status_t status;
        unsigned long size;
        bool enabled;

        /*
         * check_kvm_sev_migration() invoked via kvm_init_platform() before
         * this callback would have setup the indicator that live migration
         * feature is supported/enabled.
         */
        if (!sev_live_migration_enabled)
                return 0;

        if (!efi_enabled(EFI_RUNTIME_SERVICES)) {
                pr_info("%s : EFI runtime services are not enabled\n", __func__);
                return 0;
        }

        size = sizeof(enabled);

        /* Get variable contents into buffer */
        status = efi.get_variable(efi_sev_live_migration_enabled,
                                  &efi_variable_guid, NULL, &size, &enabled);

        if (status == EFI_NOT_FOUND) {
                pr_info("%s : EFI live migration variable not found\n", __func__);
                return 0;
        }

        if (status != EFI_SUCCESS) {
                pr_info("%s : EFI variable retrieval failed\n", __func__);
                return 0;
        }

        if (enabled == 0) {
                pr_info("%s: live migration disabled in EFI\n", __func__);
                return 0;
	}

	pr_info("%s : live migration enabled in EFI\n", __func__);
        wrmsrl(MSR_KVM_SEV_LIVE_MIGRATION, KVM_SEV_LIVE_MIGRATION_ENABLED);

        return true;

This ensures that host/guest live migration negotation is completed only
when both host and guest have support for it and also UEFI/OVMF supports
it. 

Please note, that live migration cannot be initiated before this
negotation is complete, which makes sense, as we don't want to enable it
till the host/guest negotation is complete and UEFI/OVMF support for it
is checked.

So there is this window, till guest is booting and till the late
initcall is invoked that live migration cannot be initiated, and 
KVM_GET_SHARED_PAGES ioctl will return -EINVAL till then.

> Alternatively, as explained below, this could call guest_pv_has instead.
> 
> >
> > > > +
> > > >         if (!list->size)
> > > >                 return -EINVAL;
> > > >
> > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > > index 58f89f83caab..43ea5061926f 100644
> > > > --- a/arch/x86/kvm/svm/svm.c
> > > > +++ b/arch/x86/kvm/svm/svm.c
> > > > @@ -2903,6 +2903,9 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
> > > >                 svm->msr_decfg = data;
> > > >                 break;
> > > >         }
> > > > +       case MSR_KVM_SEV_LIVE_MIGRATION:
> > > > +               sev_update_migration_flags(vcpu->kvm, data);
> > > > +               break;
> > > >         case MSR_IA32_APICBASE:
> > > >                 if (kvm_vcpu_apicv_active(vcpu))
> > > >                         avic_update_vapic_bar(to_svm(vcpu), data);
> > > > @@ -3976,6 +3979,19 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > >                         vcpu->arch.cr3_lm_rsvd_bits &= ~(1UL << (best->ebx & 0x3f));
> > > >         }
> > > >
> > > > +       /*
> > > > +        * If SEV guest then enable the Live migration feature.
> > > > +        */
> > > > +       if (sev_guest(vcpu->kvm)) {
> > > > +               struct kvm_cpuid_entry2 *best;
> > > > +
> > > > +               best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
> > > > +               if (!best)
> > > > +                       return;
> > > > +
> > > > +               best->eax |= (1 << KVM_FEATURE_SEV_LIVE_MIGRATION);
> > > > +       }
> > > > +
> > >
> > > Looking at this, I believe the only way for this bit to get enabled is
> > > if userspace toggles it. There needs to be a way for userspace to
> > > identify if the kernel underneath them does, in fact, support SEV LM.
> > > I'm at risk for having misread these patches (it's a long series), but
> > > I don't see anything that communicates upwards.
> > >
> > > This could go upward with the other paravirt features flags in
> > > cpuid.c. It could also be an explicit KVM Capability (checked through
> > > check_extension).
> > >
> > > Userspace should then have a chance to decide whether or not this
> > > should be enabled. And when it's not enabled, the host should return a
> > > GP in response to the hypercall. This could be configured either
> > > through userspace stripping out the LM feature bit, or by calling a VM
> > > scoped enable cap (KVM_VM_IOCTL_ENABLE_CAP).
> > >
> > > I believe the typical path for a feature like this to be configured
> > > would be to use ENABLE_CAP.
> >
> > I believe we have discussed and reviewed this earlier too.
> >
> > To summarize this feature, the host indicates if it supports the Live
> > Migration feature and the feature and the hypercall are only enabled on
> > the host when the guest checks for this support and does a wrmsrl() to
> > enable the feature. Also the guest will not make the hypercall if the
> > host does not indicate support for it.
> 
> I've gone through and read this patch a bit more closely, and the
> surrounding code. Previously, I clearly misread this and the
> surrounding space.
> 
> What happens if the guest just writes to the MSR anyway? Even if it
> didn't receive a cue to do so? I believe the hypercall would still get
> invoked here, since the hypercall does not check if SEV live migration
> is enabled. Similarly, the MSR for enabling it is always available,
> even if userspace didn't ask for the cpuid bit to be set. This should
> not happen. Userspace should be in control of a new hypercall rolling
> out.

No, the guest will not invoke hypercall until live migration support
has been enabled on the guest as i described above. The hypercall 
invocation code does this check as shown below: 

static void set_memory_enc_dec_hypercall(unsigned long vaddr, int npages,
                                        bool enc)
{
        unsigned long sz = npages << PAGE_SHIFT;
        unsigned long vaddr_end, vaddr_next;

        if (!sev_live_migration_enabled)
                return;
...
...

Thanks,
Ashish

> 
> I believe my interpretation last time was that the cpuid bit was
> getting surfaced from the host kernel to host userspace, but I don't
> actually see that in this patch series. Another way to ask this
> question would be "How does userspace know the kernel they are on has
> this patch series?". It needs some way of checking whether or not the
> kernel underneath it supports SEV live migration. Technically, I think
> userspace could call get_cpuid, set_cpuid (with the same values), and
> then get_cpuid again, and it would be able to infer by checking the
> SEV LM feature flag in the KVM leaf. This seems a bit kludgy. Checking
> support should be easy.
> 
> An additional question is "how does userspace choose whether live
> migration is advertised to the guest"? I believe userspace's desire
> for a particular value of the paravirt feature flag in CPUID get's
> overridden when they call set cpuid, since the feature flag is set in
> svm_vcpu_after_set_cpuid regardless of what userspace asks for.
> Userspace should have a choice in the matter.
> 
> Looking at similar paravirt-y features, there's precedent for another
> way of doing this (may be preferred over CHECK_EXTENSION/ENABLE_CAP?):
> this could call guest_pv_has before running the hypercall. The feature
> (KVM_FEATURE_SEV_LIVE_MIGRATION) would then need to be exposed with
> the other paravirt features in __do_cpuid_func. The function
> guest_pv_has would represent if userspace has decided to expose SEV
> live migration to the guest, and the sev->live_migration_enabled would
> indicate if the guest responded affirmatively to the CPUID bit.
> 
> The downside of using guest_pv_has is that, if pv enforcement is
> disabled, guest_pv_has will always return true, which seems a bit odd
> for a non-SEV guest. This isn't a deal breaker, but seems a bit odd
> for say, a guest that isn't even running SEV. Using CHECK_EXTENSION
> and ENABLE_CAP sidestep that. I'm also not certain I would call this a
> paravirt feature.
> 
> > And these were your review comments on the above :
> > I see I misunderstood how the CPUID bits get passed
> > through: usermode can still override them. Forgot about the back and
> > forth for CPUID with usermode.
> >
> > So as you mentioned, userspace can still override these and it gets a
> > chance to decide whether or not this should be enabled.
> >
> > Thanks,
> > Ashish
> 
> 
> Thanks,
> Steve