Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit

Michael Roth <michael.roth@xxxxxxx> · Wed, 6 Dec 2023 08:46:05 -0600

On Wed, Dec 06, 2023 at 02:41:13PM +0100, Paolo Bonzini wrote:
> On Tue, Dec 5, 2023 at 11:28 PM Michael Roth <michael.roth@xxxxxxx> wrote:
> > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> >      env->gdt.limit = sregs.gdt.limit;
> >      env->gdt.base = sregs.gdt.base;
> >
> > +    cr0_old = env->cr[0];
> >      env->cr[0] = sregs.cr0;
> >      env->cr[2] = sregs.cr2;
> >      env->cr[3] = sregs.cr3;
> >      env->cr[4] = sregs.cr4;
> >
> >      env->efer = sregs.efer;
> > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +            env->efer |= MSR_EFER_LMA;
> > +        }
> > +    }
> 
> There is no need to check cr0_old or sev_es_enabled(); EFER.LMA is
> simply EFER.LME && CR0.PG.

Yah, I originally had it like that, but svm_set_cr0() in the kernel only
sets it in vcpu->arch.efer it when setting CR0.PG, so I thought it might
be safer to be more selective and mirror that handling on the QEMU side
so we can leave as much of any other sanity checks on kernel/QEMU side
intact as possible. E.g., if some other bug in the kernel ends up
unsetting EFER.LMA while paging is still enabled, we'd still notice that
when passing it back in via KVM_SET_SREGS*.

But agree it's simpler to just always set it based on CR0.PG and EFER.LMA
and can send a v3 if that's preferred.

> 
> Alternatively, sev_es_enabled() could be an assertion, that is:
> 
>     if ((env->efer & MSR_EFER_LME) && (env->cr[0] & CR0_PG_MASK) &&
>        !(env->efer & MSR_EFER_LMA)) {
>         /* Workaround for... */
>         assert(sev_es_enabled());
>         env->efer |= MSR_EFER_LMA;
>     }
> 
> What do you think?

I'm a little apprehensive about this approach for similar reasons as
above. The current patch is guaranteed to only affect SEV-ES, whereas
this approach could trigger assertions for other edge-cases we aren't
aware of that could further impact the release. For instance, "in
theory", QEMU or KVM might have some handling where EFER.LMA is set
somewhere after (or outside of) KVM_GET_SREGS, but now with this
proposed change QEMU would become more restrictive and generate an
assert for those cases.

I don't think that's actually the case, but in the off-chance that such
a case exists there could be more fall-out such as further delays, or
the need for a stable fix. But no strong opinion there either if that
ends up being the preferred approach, just trying to consider the
pros/cons.

Thanks,

Mike

> 
> Thanks,
> 
> Paolo
> 
> >      /* changes to apic base and cr8/tpr are read back via kvm_arch_post_run */
> >      x86_update_hflags(env);
> > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >  {
> >      CPUX86State *env = &cpu->env;
> >      struct kvm_sregs2 sregs;
> > +    target_ulong cr0_old;
> >      int i, ret;
> >
> >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
> > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> >      env->gdt.limit = sregs.gdt.limit;
> >      env->gdt.base = sregs.gdt.base;
> >
> > +    cr0_old = env->cr[0];
> >      env->cr[0] = sregs.cr0;
> >      env->cr[2] = sregs.cr2;
> >      env->cr[3] = sregs.cr3;
> >      env->cr[4] = sregs.cr4;
> >
> >      env->efer = sregs.efer;
> > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > +            env->efer |= MSR_EFER_LMA;
> > +        }
> > +    }
> >
> >      env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
> >
> > --
> > 2.25.1
> >
>