Re: [PATCH v6 08/42] x86/sev-es: initialize sev_status/features within #VC handler

Michael Roth <michael.roth@xxxxxxx> · Thu, 21 Oct 2021 16:34:27 -0500

On Thu, Oct 21, 2021 at 04:48:16PM +0200, Borislav Petkov wrote:
> On Wed, Oct 20, 2021 at 11:10:23AM -0500, Michael Roth wrote:
> > At which point we then switch to using the CPUID table? But at that
> > point all the previous CPUID checks, both SEV-related/non-SEV-related,
> > are now possibly not consistent with what's in the CPUID table. Do we
> > then revalidate?
> 
> Well, that's a tough question. That's basically the same question as,
> does Linux support heterogeneous cores and can it handle hardware
> features which get enabled after boot. The perfect example is, late
> microcode loading which changes CPUID bits and adds new functionality.
> 
> And the answer to that is, well, hard. You need to decide this on a
> case-by-case basis.
> 
> But isn't it that the SNP CPUID page will be parsed early enough anyway
> so that kernel proper will see only SNP CPUID info and init properly
> using that?

At the time I wrote that I thought you were suggesting moving the SNP CPUID
table initialization to where sme_enable() is in current upstream, so it
seemed worth mentioning, but since the idea was actually to move all the
sev_status initialization in sme_enable() earlier in the code to where
SNP CPUID table init needs to happen (before first cpuid calls are made), I
this scenario is avoided.

> 
> > Even a non-malicious hypervisor might provide inconsistent values
> > between the two sources due to bugs, or SNP validation suppressing
> > certain feature bits that hypervisor otherwise exposes, etc.
> 
> There's also migration, lemme point to a very recent example:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fr%2F20211021104744.24126-1-jane.malalane%40citrix.com&amp;data=04%7C01%7Cmichael.roth%40amd.com%7C4aaa998fcd134c8d054608d994a1d1aa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637704245057316093%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OKO9o3YzKwRkyWPpam2%2Fxn4aRSMKtPEnZjn05g81SP8%3D&amp;reserved=0
> 
> which is exactly what you say - a non-malicious HV taking care of its
> migration pool. So how do you handle that?

I concur with David's assessment on that solution being compatible with
CPUID enforcement policy. But it's certainly something to consider more
generally.

Fortunately I think I misspoke earlier, I thought there was a case or 2
where bits were suppressed, rather than causing a validation failure,
but looking back through the PPR I doesn't seem like that's actually the
case. Which is good, since that would indeed be painful to deal with in
the context of migration.