On Mon, 2023-03-06 at 17:34 +0100, Juergen Gross wrote: > When running virtualized, MTRR access can be reduced (e.g. in Xen PV > guests or when running as a SEV-SNP guest under Hyper-V). Typically > the hypervisor will reset the MTRR feature in CPUID data, resulting > in no MTRR memory type information being available for the kernel. > > This has turned out to result in problems: > > - Hyper-V SEV-SNP guests using uncached mappings where they shouldn't > - Xen PV dom0 mapping memory as WB which should be UC- instead > > Solve those problems by supporting to set a static MTRR state, > overwriting the empty state used today. In case such a state has been > set, don't call get_mtrr_state() in mtrr_bp_init(). The set state > will only be used by mtrr_type_lookup(), as in all other cases > mtrr_enabled() is being checked, which will return false. Accept the > overwrite call only for selected cases when running as a guest. > Disable X86_FEATURE_MTRR in order to avoid any MTRR modifications by > just refusing them. > > [...] > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c > index ee09d359e08f..49b4cc923312 100644 > --- a/arch/x86/kernel/cpu/mtrr/generic.c > +++ b/arch/x86/kernel/cpu/mtrr/generic.c > @@ -8,10 +8,12 @@ > #include <linux/init.h> > #include <linux/io.h> > #include <linux/mm.h> > - > +#include <linux/cc_platform.h> > #include <asm/processor-flags.h> > #include <asm/cacheinfo.h> > #include <asm/cpufeature.h> > +#include <asm/hypervisor.h> > +#include <asm/mshyperv.h> Is <asm/mshyperv.h> needed here? > #include <asm/tlbflush.h> > #include <asm/mtrr.h> > #include <asm/msr.h> > @@ -240,6 +242,48 @@ static u8 mtrr_type_lookup_variable(u64 start, u64 end, u64 *partial_end, > return mtrr_state.def_type; > } > > +/** > + * mtrr_overwrite_state - set static MTRR state > + * > + * Used to set MTRR state via different means (e.g. with data obtained from > + * a hypervisor). +KVM list and KVM maintainers, IIUC in the next patch, SEV-SNP guest only sets a synthetic MTRR w/o telling the hypervisor (hyperv). I think this works for SEV-SNP running on top of hyperv because they have mutual understanding? What about the SNP guest running on other hypervisors such as KVM? Since this code covers TDX guest too, I think eventually it makes sense for TDX guest to use this function too (to avoid #VE IIUC). If want to do that, then I think TDX guest should have the same mutual understanding with *ALL* hypervisor, as I am not sure what's the point of making the TDX guest's MTRR behaviour depending on specific hypervisor. For now I don't see there's any use case for TDX guest to use non-WB memory type (in fact, KVM always maps guest memory as WB if there's no non-coherent DMA to the guest memory), so to me it seems it's OK to make a universal mutual understanding that TDX guest will always have WB memory type for all memory. But, I am not sure whether it's better to have a standard hypercall between guest & hypervisor for this purpose so things can be more flexible? > + * Is allowed only for special cases when running virtualized. Must be called > + * from the x86_init.hyper.init_platform() hook. X86_FEATURE_MTRR must be off. > + */ > +void mtrr_overwrite_state(struct mtrr_var_range *var, unsigned int num_var, > + mtrr_type def_type) > +{ > + unsigned int i; > + > + if (WARN_ON(mtrr_state_set || > + hypervisor_is_type(X86_HYPER_NATIVE) || > + !cpu_feature_enabled(X86_FEATURE_HYPERVISOR) || > + (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP) && > + !hv_is_isolation_supported() && > + !cpu_feature_enabled(X86_FEATURE_XENPV) && > + !cpu_feature_enabled(X86_FEATURE_TDX_GUEST)))) > + return; > + > + /* Disable MTRR in order to disable MTRR modifications. */ > + setup_clear_cpu_cap(X86_FEATURE_MTRR); > + > + if (var) { > + if (num_var > MTRR_MAX_VAR_RANGES) { > + pr_warn("Trying to overwrite MTRR state with %u variable entries\n", > + num_var); > + num_var = MTRR_MAX_VAR_RANGES; > + } > + for (i = 0; i < num_var; i++) > + mtrr_state.var_ranges[i] = var[i]; > + num_var_ranges = num_var; > + } > + > + mtrr_state.def_type = def_type; > + mtrr_state.enabled |= MTRR_STATE_MTRR_ENABLED; > + > + mtrr_state_set = 1; > +} > + > /** > * mtrr_type_lookup - look up memory type in MTRR > * > diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c > index 7596ebeab929..5fe62ee0361b 100644 > --- a/arch/x86/kernel/cpu/mtrr/mtrr.c > +++ b/arch/x86/kernel/cpu/mtrr/mtrr.c > @@ -666,6 +666,15 @@ void __init mtrr_bp_init(void) > const char *why = "(not available)"; > unsigned int phys_addr; > > + if (mtrr_state.enabled) { > + /* Software overwrite of MTRR state, only for generic case. */ > + mtrr_calc_physbits(true); > + init_table(); > + pr_info("MTRRs set to read-only\n"); > + > + return; > + } > + > phys_addr = mtrr_calc_physbits(boot_cpu_has(X86_FEATURE_MTRR)); > > if (boot_cpu_has(X86_FEATURE_MTRR)) { > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 16babff771bd..0cccfeb67c3a 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -1037,6 +1037,8 @@ void __init setup_arch(char **cmdline_p) > /* > * VMware detection requires dmi to be available, so this > * needs to be done after dmi_setup(), for the boot CPU. > + * For some guest types (Xen PV, SEV-SNP, TDX) it is required to be > + * called before cache_bp_init() for setting up MTRR state. > */ > init_hypervisor_platform(); >