Re: [PATCH v2] x86/kexec: Add EFI config table identity mapping for kexec kernel

Michael Roth <michael.roth@xxxxxxx> · Fri, 7 Jul 2023 10:25:15 -0500

On Fri, Jul 07, 2023 at 10:57:12AM +0200, Borislav Petkov wrote:
> On Fri, Jul 07, 2023 at 10:22:56AM +0200, Joerg Roedel wrote:
> > On Fri, Jul 07, 2023 at 12:23:59PM +0800, Baoquan He wrote:
> > > I am wondering why we don't detect the cpu type and return early inside
> > > sev_enable() if it's Intel cpu.
> > > 
> > > We can't rely on CONFIG_AMD_MEM_ENCRYPT to decide if the code need be
> > > executed or not because we usually enable them all in distros.

If the SETUP_CC_BLOB config table entry isn't present, then none of that
code will run. I think the patch here addresses the main issue: kexec
has handed us a SETUP_EFI setup_data entry, with a pointer to a valid
config table, but we don't map it before accessing it. I should have
done a better job of checking for this, since there has been similar
cases exposed by the SEV checks, e.g.:

  commit b57feed2cc2622ae14b2fa62f19e973e5e0a60cf
  Author: Michael Roth <michael.roth@xxxxxxx>
  Date:   Tue Jul 5 21:53:15 2022 -0500

      x86/compressed/64: Add identity mappings for setup_data entries

but I don't think there's anything inherently wrong with accessing the
EFI config table in early boot. SEV is earliest user now, but something
else can come along. We certainly have early access to EFI system table
and other bootparams fields like ACPI RDSP in this neck of the woods.

So regardless of how we decide to handle sev_enable(), I think this patch
should be applied, rather than allowing this to lurk in the shadows until
it claims its next victim.

> > 
> > Looking at the code in head_64.S, by the time sev_enable() runs the SEV
> > bit should already be set in sev_status. Maybe use that to detect
> > whether SEV is enabled and bail out early?

sev_enabled() is actually what sets sev_status global, but for startup32
path, the SEV_ENABLED bit is artificially set earlier to force the C-bit
check in startup32_check_sev_cbit():

  fef81c8626: x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path

> 
> There was something about getting the CPUID page on SNP *before*
> actually calling CPUID but this is not the first time we had trouble in
> this area. This needs to be done differently.
> 
> Michael?

The main issue is we don't know when it is safe to access the SEV_STATUS
MSR until we've checked for the CPUID bit that advertises SEV
capability, otherwise we can generate a #GP on machines that don't
support it.

With SNP, the most reliable way to access this CPUID bit is via the SNP
CPUID table entry corresponding to 0x8000001F, rather than the emulated
values provided by hypervisor, like we do with SEV-ES. So that's how
things are implemented currently.

We *could* instead revert back to using the hypervisor-provided CPUID
values for 0x8000001F, as with SEV-ES, which would allow us to check
SEV_STATUS MSR *prior* to SNP CPUID table setup, instead of afterward...

What's awkward about that approach is that we'd effectively be bypassing
all the validation that the SNP firmware does on the 0x8000001F leaf. In
general this seems backwards, however:

  a) we still have the option of re-checking the 0x8000001F leaf once
     the SNP CPUID table is setup to see if the hypervisor presented
     something different to the guest than what we were expecting
  b) in this *specific* case, we only need to check for SEV CPUID bit
     so we know it is safe to check SEV_STATUS MSR. A malicious
     hypervisor can:

       i) trick us into causing a crash via the MSR access: but that's okay,
          security isn't compromised. hypervisor's fault, not ours.
      ii) trick us into thinking SEV is enabled so hypervisor can
          intercept/emulated the MSR access: but that's okay, because
          when SEV is enabled the SEV_STATUS MSR can't be intercepted,
          and if SEV isn't actually enabled, the guest wouldn't get past
          attestation.
     iii) trick us into thinking SEV isn't enabled, which should be fine
          if we leave sev_status unset.

So, yes, we can avoid checking for CC blob by relying on the above
details and reversing the ordering of CPUID table setup and initial
SEV_STATUS MSR checks. But we'd want to similarly audit any changes to these
paths so that an eventual case doesn't pop up where a malicious hypervisor
can cause a certain check/configuration to be bypassed by toying with CPUID
values at this stage of boot, which is why we've so far tried to maintain
the current handling.

It would be unfortunate if we finally abandoned this path because of the
issue being hit here though. I think the patch posted here is the proper
resolution to the issue being hit, and I'm hoping at this point we've
identified all the similar cases where EFI/setup_data-related structures
were missing explicit mappings. But if we still think it's too much of a
liability to access the EFI config table outside of SEV-enabled guests,
then I can work on re-implementing things based on the above logic.

-Mike

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec