Re: [PATCH v8 29/40] x86/compressed/64: add support for SEV-SNP CPUID table in #VC handlers

Michael Roth <michael.roth@xxxxxxx> · Tue, 18 Jan 2022 19:18:06 -0600

On Tue, Jan 18, 2022 at 12:49:30PM -0600, Michael Roth wrote:
> On Tue, Jan 18, 2022 at 06:41:16PM +0100, Borislav Petkov wrote:
> > On Tue, Jan 18, 2022 at 11:20:43AM -0600, Michael Roth wrote:
> > > The HV fills out the initial contents of the CPUID page, which includes
> > > the count. SNP/PSP firmware will validate the contents the HV tries to put
> > > in the initial page, but does not currently enforce that the 'count' field
> > > is non-zero.
> > 
> > And regardless, what if the HV fakes the count - how do you figure out
> > what the proper count is? You go and read the whole CPUID page and try
> > to make sense of what's there, even beyond the "last" function leaf.
> 
> 

<snip>

> > 
> > But see above, how do you check whether the HV hasn't "hidden" some
> > entries by modifying the count field?
> > 
> > Either I'm missing something or this sounds really weird...
> 
> ...count must match the actual number of entries in the table in all
> cases.

Turns out in my testing earlier there was a separate check that was
causing the PSP to fail, so I re-tested the behavior, and things are
actually a bit more interesting, but nothing too concerning:

If 'fake_count'/'reported_count' is greater than the actual number of
entries in the table, 'actual_count', then all table entries up to
'fake_count' will also need to pass validation. Generally the table
will be zero'd out initially, so those additional/bogus entries will
be interpreted as a CPUID leaves where all fields are 0. Unfortunately,
that's still considered a valid leaf, even if it's a duplicate of the
*actual* 0x0 leaf present earlier in the table. The current code will
handle this fine, since it scans the table in order, and uses the
valid 0x0 leaf earlier in the table.

This is isn't really a special case though, it falls under the general
category of a hypervisor inserting garbage entries that happen to pass
validation, but don't reflect values that a guest would normally see.
This will be detectable as part of guest owner attestation, since the
guest code is careful to guarantee that the values seen after boot,
once the attestation stage is reached, will be identical to the values
seen during boot, so if this sort of manipulation of CPUID values
occurred, the guest owner will notice this during attestation, and can
abort the boot at that point. The Documentation patch addresses this
in more detail.

If 'fake_count' is less than 'actual_count', then the PSP skips
validation for anything >= 'fake_count', and leaves them in the table.
That should also be fine though, since guest code should never exceed
'fake_count'/'reported_count', as that's a blatant violation of the
spec, and it doesn't make any sense for a guest to do this. This will
effectively 'hide' entries, but those resulting missing CPUID leaves
will be noticeable to the guest owner once attestation phase is
reached.

This does all highlight the need for some very thorough guidelines
on how a guest owner should implement their attestation checks for
cpuid, however. I think a section in the reference implementation
notes/document that covers this would be a good starting point. I'll
also check with the PSP team on tightening up some of these CPUID
page checks to rule out some of these possibilities in the future.

> in the table. count==0 is only special in that code might erroneously
> decide to treat it as an indicator that cpuid table isn't enabled at
> all, but since that case causes termination it should actually be ok.
> 
> Though I wonder if we should do something like this to still keep
> callers from relying on checking count==0 directly:
> 
>   static const struct snp_cpuid_info *
>   snp_cpuid_info_get_ptr(void)
>   {
>           const struct snp_cpuid_info *cpuid_info;
>           void *ptr;
>   
>           /*
>            * This may be called early while still running on the initial identity
>            * mapping. Use RIP-relative addressing to obtain the correct address
>            * in both for identity mapping and after switch-over to kernel virtual
>            * addresses.
>            */
>           asm ("lea cpuid_info_copy(%%rip), %0"
>                : "=r" (ptr)
>                : "p" (&cpuid_info_copy));
>   
>           cpuid_info = ptr;
>           if (cpuid_info->count == 0)
>                   return NULL
>   
>           return cpuid_info;
>   }

Nevermind, that doesn't work since snp_cpuid_info_get_ptr() is also called
by snp_cpuid_info_get_ptr() *prior* to initializing the table, so it ends
seeing cpuid->count==0 and fails right away. So your initial suggestion
of checking cpuid->count==0 at the call-sites to determine if the table
is enabled is probably the best option.

Sorry for the noise/confusion.