On Mon, Oct 16, 2023 at 08:27:35AM -0500, Michael Roth wrote: > +static bool early_rmptable_check(void) > +{ > + u64 rmp_base, rmp_size; > + > + /* > + * For early BSP initialization, max_pfn won't be set up yet, wait until > + * it is set before performing the RMP table calculations. > + */ > + if (!max_pfn) > + return true; This already says that this is called at the wrong point during init. Right now we have early_identify_cpu -> early_init_amd -> early_detect_mem_encrypt which runs only on the BSP but then early_init_amd() is called in init_amd() too so that it takes care of the APs too. Which ends up doing a lot of unnecessary work on each AP in early_detect_mem_encrypt() like calculating the RMP size on each AP unnecessarily where this needs to happen exactly once. Is there any reason why this function cannot be moved to init_amd() where it'll do the normal, per-AP init? And the stuff that needs to happen once, needs to be called once too. > + > + return snp_get_rmptable_info(&rmp_base, &rmp_size); > +} > + > static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > { > u64 msr; > @@ -659,6 +674,9 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > if (!(msr & MSR_K7_HWCR_SMMLOCK)) > goto clear_sev; > > + if (cpu_has(c, X86_FEATURE_SEV_SNP) && !early_rmptable_check()) > + goto clear_snp; > + > return; > > clear_all: > @@ -666,6 +684,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c) > clear_sev: > setup_clear_cpu_cap(X86_FEATURE_SEV); > setup_clear_cpu_cap(X86_FEATURE_SEV_ES); > +clear_snp: > setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); > } > } ... > +bool snp_get_rmptable_info(u64 *start, u64 *len) > +{ > + u64 max_rmp_pfn, calc_rmp_sz, rmp_sz, rmp_base, rmp_end; > + > + rdmsrl(MSR_AMD64_RMP_BASE, rmp_base); > + rdmsrl(MSR_AMD64_RMP_END, rmp_end); > + > + if (!(rmp_base & RMP_ADDR_MASK) || !(rmp_end & RMP_ADDR_MASK)) { > + pr_err("Memory for the RMP table has not been reserved by BIOS\n"); > + return false; > + } If you're masking off bits 0-12 above... > + > + if (rmp_base > rmp_end) { ... why aren't you using the masked out vars further on? I know, the hw will say, yeah, those bits are 0 but still. IOW, do: rmp_base &= RMP_ADDR_MASK; rmp_end &= RMP_ADDR_MASK; after reading them. > + pr_err("RMP configuration not valid: base=%#llx, end=%#llx\n", rmp_base, rmp_end); > + return false; > + } > + > + rmp_sz = rmp_end - rmp_base + 1; > + > + /* > + * Calculate the amount the memory that must be reserved by the BIOS to > + * address the whole RAM, including the bookkeeping area. The RMP itself > + * must also be covered. > + */ > + max_rmp_pfn = max_pfn; > + if (PHYS_PFN(rmp_end) > max_pfn) > + max_rmp_pfn = PHYS_PFN(rmp_end); > + > + calc_rmp_sz = (max_rmp_pfn << 4) + RMPTABLE_CPU_BOOKKEEPING_SZ; > + > + if (calc_rmp_sz > rmp_sz) { > + pr_err("Memory reserved for the RMP table does not cover full system RAM (expected 0x%llx got 0x%llx)\n", > + calc_rmp_sz, rmp_sz); > + return false; > + } > + > + *start = rmp_base; > + *len = rmp_sz; > + > + return true; > +} > + > +static __init int __snp_rmptable_init(void) > +{ > + u64 rmp_base, rmp_size; > + void *rmp_start; > + u64 val; > + > + if (!snp_get_rmptable_info(&rmp_base, &rmp_size)) > + return 1; > + > + pr_info("RMP table physical address [0x%016llx - 0x%016llx]\n", That's "RMP table physical range" > + rmp_base, rmp_base + rmp_size - 1); > + > + rmp_start = memremap(rmp_base, rmp_size, MEMREMAP_WB); > + if (!rmp_start) { > + pr_err("Failed to map RMP table addr 0x%llx size 0x%llx\n", rmp_base, rmp_size); No need to dump rmp_base and rmp_size again here - you're dumping them above. > + return 1; > + } > + > + /* > + * Check if SEV-SNP is already enabled, this can happen in case of > + * kexec boot. > + */ > + rdmsrl(MSR_AMD64_SYSCFG, val); > + if (val & MSR_AMD64_SYSCFG_SNP_EN) > + goto skip_enable; > + > + /* Initialize the RMP table to zero */ Again: useless comment. > + memset(rmp_start, 0, rmp_size); > + > + /* Flush the caches to ensure that data is written before SNP is enabled. */ > + wbinvd_on_all_cpus(); > + > + /* MFDM must be enabled on all the CPUs prior to enabling SNP. */ First of all, use the APM bit name here pls: MtrrFixDramModEn. And then, for the life of me, I can't find any mention in the APM why this bit is needed. Neither in "15.36.2 Enabling SEV-SNP" nor in "15.34.3 Enabling SEV". Looking at the bit defintions of WrMem an RdMem - read and write requests get directed to system memory instead of MMIO so I guess you don't want to be able to write MMIO for certain physical ranges when SNP is enabled but it'll be good to have this properly explained instead of a "this must happen" information-less sentence. > + on_each_cpu(mfd_enable, NULL, 1); > + > + /* Enable SNP on all CPUs. */ Useless comment. > + on_each_cpu(snp_enable, NULL, 1); > + > +skip_enable: > + rmp_start += RMPTABLE_CPU_BOOKKEEPING_SZ; > + rmp_size -= RMPTABLE_CPU_BOOKKEEPING_SZ; > + > + rmptable_start = (struct rmpentry *)rmp_start; > + rmptable_max_pfn = rmp_size / sizeof(struct rmpentry) - 1; > + > + return 0; > +} > + > +static int __init snp_rmptable_init(void) > +{ > + int family, model; > + > + if (!cpu_feature_enabled(X86_FEATURE_SEV_SNP)) > + return 0; > + > + family = boot_cpu_data.x86; > + model = boot_cpu_data.x86_model; Looks useless - just use boot_cpu_data directly below. As mentioned here already https://lore.kernel.org/all/Y9ubi0i4Z750gdMm@xxxxxxx/ And I already mentioned that for v9: https://lore.kernel.org/r/20230621094236.GZZJLGDAicp1guNPvD@fat_crate.local Next time I'm NAKing this patch until you incorporate all review comments or you give a technical reason why you disagree with them. > + /* > + * RMP table entry format is not architectural and it can vary by processor and > + * is defined by the per-processor PPR. Restrict SNP support on the known CPU > + * model and family for which the RMP table entry format is currently defined for. > + */ > + if (family != 0x19 || model > 0xaf) > + goto nosnp; > + > + if (amd_iommu_snp_enable()) > + goto nosnp; > + > + if (__snp_rmptable_init()) > + goto nosnp; > + > + cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL); > + > + return 0; > + > +nosnp: > + setup_clear_cpu_cap(X86_FEATURE_SEV_SNP); > + return -ENOSYS; > +} > + > +/* > + * This must be called after the PCI subsystem. This is because amd_iommu_snp_enable() > + * is called to ensure the IOMMU supports the SEV-SNP feature, which can only be > + * called after subsys_initcall(). > + * > + * NOTE: IOMMU is enforced by SNP to ensure that hypervisor cannot program DMA > + * directly into guest private memory. In case of SNP, the IOMMU ensures that > + * the page(s) used for DMA are hypervisor owned. > + */ > +fs_initcall(snp_rmptable_init); This looks backwards. AFAICT, the IOMMU code should call arch code to enable SNP at the right time, not the other way around - arch code calling driver code. Especially if the SNP table enablement depends on some exact IOMMU init_state: if (init_state > IOMMU_ENABLED) { pr_err("SNP: Too late to enable SNP for IOMMU.\n"); -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette