On Sat, Feb 05, 2022 at 11:54:01AM +0100, Borislav Petkov wrote: > On Fri, Jan 28, 2022 at 11:17:52AM -0600, Brijesh Singh wrote: > > +/* > > + * Individual entries of the SEV-SNP CPUID table, as defined by the SEV-SNP > > + * Firmware ABI, Revision 0.9, Section 7.1, Table 14. > > + */ > > +struct snp_cpuid_fn { > > + u32 eax_in; > > + u32 ecx_in; > > + u64 xcr0_in; > > + u64 xss_in; > > So what's the end result here: > > -+ u64 __unused; > -+ u64 __unused2; > ++ u64 xcr0_in; > ++ u64 xss_in; > > those are not unused fields anymore but xcr0 and xss input values? > > Looking at the FW abi doc, they're only mentioned in "Table 14. > CPUID_FUNCTION Structure" that they're XCR0 and XSS at the time of the > CPUID execution. > > But those values are input values to what exactly, guest or firmware? These are inputs to firmware, either through a MSG_CPUID_REQ guest message that gets passed along by the HV to firmware for validation, or, in this case, through SNP_LAUNCH_UPDATE when the HV passes the initial CPUID table contents to firmware for validation before it gets mapped into guest memory. > > There's a typo in the FW doc, btw: > > "The guest constructs an MSG_CPUID_REQ message as defined in Table 13. > This message contains an array of CPUID function structures as defined > in Table 13." > > That second "Table" is 14 not 13. > > So, if an array CPUID_FUNCTION[] is passed as part of an MSG_CPUID_REQ > command, then, the two _IN variables contain what the guest received > from the HV for XCR0 and XSS values. Which means, this is the guest > asking the FW whether those values the HV gave the guest are kosher. > > Am I close? Most of that documentation is written in the context of the MSG_CPUID_REQ guest message interface. In that case, a guest has been provided a certain sets of CPUID values by KVM CPUID instruction emulation (or potentially the CPUID table), and wants firmware to validate whether all/some of them are valid for that particular system/configuration. In the case of 0xD subfunction 0x0 and 0x1, the CPUID leaf on real hardware will change depending on what the guest sets in its actual XCR0 control register (APM Volume 2 11.5.2) or IA32_XSS_MSR register, whose combined bits are OR'd to form the XFEATURE_ENABLED_MASK. CPUID leaf 0xD subfunctions 0x0 and 0x1 advertise the size of the XSAVE area required for the features currently enabled in XFEATURE_ENABLED_MASK via the EBX return value from the corresponding CPUID instruction, so for firmware to validate whether that EBX value is valid for a particular system/configuration, it needs to know what the guest's currently XFEATURE_ENABLED_MASK is, so it needs to know what the guest has set in its XCR0 and IA32_XSS_MSR registers. That's where the XCR0_IN and XSS_IN inputs come into play: they are inputs to the firmware so it can do the proper validation based on the guest's current state / XFEATURE_ENABLED_MASK. But that guest message interface will likely be deprecated at some point in favor of the static CPUID table (SNP FW ABI 8.14.2.6), since all the firmware validation has already been done in advance, and any additional validation is redundant, so this series relies purely on the CPUID table. In the case of the CPUID table, we don't know what the guest's XFEATURE_ENABLED_MASK is going to be at the time a CPUID instruction is issued for 0xD:0x0,0xD,0x1, and those values will change as the guest boots and begins enabling additional XSAVE features. The only thing that's certain is that there needs to be at least one CPUID table entry for 0xD:0x0 and 0xD:0x1 corresponding to the initial XFEATURE_ENABLED_MASK, which will correspond to XCR0_IN=1/XSS_IN=0, or XCR0_IN=3/XSS_IN=0 depending on the hypervisor/guest implementation. What to do about other combinations of XCR0_IN/XSS_IN gets a bit weird, since encoding all those permutations would not scale well, and alternative methods of trying to encode them in the table would almost certainly result in lots of different incompatibile guest implementations floating around. So our recommendation has been for hypervisors to encode only the 0xD:0x0/0xD:0x1 entries corresponding to these initial guest states in the CPUID table, and for guests to ignore the XCR0_IN/XSS_IN values completely, and instead have the guest calculate the XSAVE save area size dynamically (EBX). This has multiple benefits: 1) Guests that implement this approach would be compatible even with hypervisors that try to encode additional permutations of XCR0_IN/XSS_IN in the table based on their read of the spec 2) It is just as secure, since it requires only that the guest trust itself and the APM specification, which is the ultimate authority on what firmware would be validationa against 3) The other leaf registers, EAX/ECX/EDX are not dependent on XCR0_IN/XSS_IN/XFEATURE_ENABLED_MASK, and so any 0xD:0x0/0xD:0x1 entry will do as the starting point for the calculation. That's why in v8 these fields were documented as unused1/unused2. But when you suggested that we should enforce 0 in that case, it became clear we should adjust our recommendation to go ahead and check for the specific XCR0_IN={1,3}/XSS_IN=0 values when searching for the base 0xD:0x0/0xD:0x1 entry to use, because a hypervisor that chooses to use XCR0_IN/XSS_IN is not out-of-spec, and should be supported, and there isn't really any read of the spec from the hypervisor perspective where XCR0_IN=0/XSS_IN=0 would be valid. So for v9 I went ahead and added the XCR0_IN/XSS_IN checks, and updated our recommendations (will send an updated version to SNP mailing list by Monday) to not ignore them in the guest. I added some comments in snp_cpuid_get_validated_func() and snp_cpuid_calc_xsave_size() to clarify how XCR0_IN/XSS_IN are being used there, but let me know if those need to be beefed up a bit. -Mike