On Thu, Jun 16, 2022, Tom Lendacky wrote: > On 6/14/22 14:52, Sean Christopherson wrote: > > On Tue, Jun 14, 2022, Tom Lendacky wrote: > > > On 6/14/22 11:13, Sean Christopherson wrote: > > > > > > > This breaks SME on Rome and Milan when compiling with clang-13. I haven't been > > > > > > > able to figure out exactly what goes wrong. printk isn't functional at this point, > > > > > > > and interactive debug during boot on our test systems is beyond me. I can't even > > > > > > > verify that the bug is specific to clang because the draconian build system for our > > > > > > > test systems apparently is stuck pointing at gcc-4.9. > > > > > > > > > > > > > > I suspect the issue is related to relocation and/or encrypting memory, as skipping > > > > > > > the call to early_snp_set_memory_shared() if SNP isn't active masks the issue. > > > > > > > I've dug through the assembly and haven't spotted a smoking gun, e.g. no obvious > > > > > > > use of absolute addresses. > > > > > > > > > > > > > > Forcing a VM through the same path doesn't fail. I can't test an SEV guest at the > > > > > > > moment because INIT_EX is also broken. > > > > > > > > > > > > I'm not sure if there's a way to remove the jump table optimization for > > > the arch/x86/coco/core.c file when retpolines aren't configured. > > > > And for post-boot I don't think we'd want to disable any such optimizations. > > > > A possibled "fix" would be to do what sme_encrypt_kernel() does and just query > > sev_status directly. But even that works, the fragility of the boot code is > > terrifying :-( I can't think of any clever solutions though. > > I worry that another use of cc_platform_has() could creep in at some point > and cause the same issue. Not sure how bad it would be, performance-wise, to > remove the jump table optimization for arch/x86/coco/core.c. One thought would be to initialize "vendor" to a bogus value, disallow calls to cc_set_vendor() until after the kernel as gotten to a safe point, and then WARN (or panic?) if cc_platform_has() is called before "vendor" is explicitly set. New calls can still get in, but they'll be much easier to detect and less likely to escape initial testing. diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 49b44f881484..803220cd34a6 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -13,7 +13,11 @@ #include <asm/coco.h> #include <asm/processor.h> -static enum cc_vendor vendor __ro_after_init; +/* + * Initialize the vendor to garbage to detect usage of cc_platform_has() before + * the vendor has been set. + */ +static enum cc_vendor vendor = CC_NR_VENDORS __ro_after_init; static u64 cc_mask __ro_after_init; static bool intel_cc_platform_has(enum cc_attr attr) @@ -90,7 +94,10 @@ bool cc_platform_has(enum cc_attr attr) return intel_cc_platform_has(attr); case CC_VENDOR_HYPERV: return hyperv_cc_platform_has(attr); + case CC_VENDOR_NONE: + return false; default: + WARN_ONCE(1, "blah blah blah"); return false; } } diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h index 3d98c3a60d34..adfd2fbce7ac 100644 --- a/arch/x86/include/asm/coco.h +++ b/arch/x86/include/asm/coco.h @@ -9,6 +9,7 @@ enum cc_vendor { CC_VENDOR_AMD, CC_VENDOR_HYPERV, CC_VENDOR_INTEL, + CC_NR_VENDORS, }; void cc_set_vendor(enum cc_vendor v); > I guess we can wait for Boris to get back and chime in. > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > > index bd4a34100ed0..5efab0d8e49d 100644 > > --- a/arch/x86/kernel/head64.c > > +++ b/arch/x86/kernel/head64.c > > @@ -127,7 +127,9 @@ static bool __head check_la57_support(unsigned long physaddr) > > } > > #endif > > > > -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) > > +static unsigned long __head sme_postprocess_startup(struct boot_params *bp, > > + pmdval_t *pmd, > > + unsigned long physaddr) > > I noticed that you added the physaddr parameter but never use it... Likely just garbage on my end, I was trying various ideas.