a number of comments -- feel free to address or ignore each as you see fit: On 08/13/14 21:09, Alex Williamson wrote: > The SDM specifies (June 2014 Vol3 11.11.5): > > On a hardware reset, the P6 and more recent processors clear the > valid flags in variable-range MTRRs and clear the E flag in the > IA32_MTRR_DEF_TYPE MSR to disable all MTRRs. All other bits in the > MTRRs are undefined. > > We currently do none of that, so whatever MTRR settings you had prior > to reset is what you have after reset. Usually this doesn't matter > because KVM often ignores the guest mappings and uses write-back > anyway. However, if you have an assigned device and an IOMMU that > allows NoSnoop for that device, KVM defers to the guest memory > mappings which are now stale after reset. The result is that OVMF > rebooting on such a configuration takes a full minute to LZMA > decompress the EFI volume, a process that is nearly instant on the For pedantry, instead of "EFI volume" we could say "LZMA-compressed Firmware File System file in the FVMAIN_COMPACT firmware volume". > initial boot. > > Add support for reseting the SDM defined bits on vCPU reset. > > Also, by my count we're already in danger of overflowing the entries > array that we pass to KVM, so I've topped it up for a bit of headroom. > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: qemu-stable@xxxxxxxxxx > --- > > target-i386/cpu.c | 6 ++++++ > target-i386/cpu.h | 4 ++++ > target-i386/kvm.c | 14 +++++++++++++- > 3 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 6d008ab..b5ae654 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -2588,6 +2588,12 @@ static void x86_cpu_reset(CPUState *s) > > env->xcr0 = 1; > > + /* MTRR init - Clear global enable bit and valid bit in each variable reg */ > + env->mtrr_deftype &= ~MSR_MTRRdefType_Enable; > + for (i = 0; i < MSR_MTRRcap_VCNT; i++) { > + env->mtrr_var[i].mask &= ~MSR_MTRRphysMask_Valid; > + } > + I can see that the limit, MSR_MTRRcap_VCNT, is #defined as 8. Would you be willing to update the definition of the "CPUX86State.mtrr_var" array too, in "target-i386/cpu.h"? Currently it says: MTRRVar mtrr_var[8]; > #if !defined(CONFIG_USER_ONLY) > /* We hard-wire the BSP to the first CPU. */ > if (s->cpu_index == 0) { > diff --git a/target-i386/cpu.h b/target-i386/cpu.h > index e634d83..139890f 100644 > --- a/target-i386/cpu.h > +++ b/target-i386/cpu.h > @@ -337,6 +337,8 @@ > #define MSR_MTRRphysBase(reg) (0x200 + 2 * (reg)) > #define MSR_MTRRphysMask(reg) (0x200 + 2 * (reg) + 1) > > +#define MSR_MTRRphysMask_Valid (1 << 11) > + Note: a signed integer (int32_t). > #define MSR_MTRRfix64K_00000 0x250 > #define MSR_MTRRfix16K_80000 0x258 > #define MSR_MTRRfix16K_A0000 0x259 > @@ -353,6 +355,8 @@ > > #define MSR_MTRRdefType 0x2ff > > +#define MSR_MTRRdefType_Enable (1 << 11) > + Note: a signed integer (int32_t). Now, if you scroll back to the bit-clearing in x86_cpu_reset(), you see ~MSR_MTRRdefType_Enable and ~MSR_MTRRphysMask_Valid These expressions evaluate to negative int (int32_t) values (because the bit-neg sets their sign bits). Due to two's complement (which we are allowed to assume in qemu, see HACKING), the negative int32_t values will be just correct for the next step, when they are converted to uint64_t for the bit-ands, as part of the usual arithmetic conversions. ("env->mtrr_deftype" and "env->mtrr_var[i].mask" are uint64_t.) Mathematically this means an addition of UINT64_MAX+1. ("Sign extended".) In general, even though they are correct due to two's complement, I dislike such detours into negative-valued signed integers by way of bit-neg, because people are mostly unaware of them and assume they "just work". My preferred solution would be #define MSR_MTRRphysMask_Valid (1ull << 11) #define MSR_MTRRdefType_Enable (1ull << 11) Feel free to ignore this of course. > #define MSR_CORE_PERF_FIXED_CTR0 0x309 > #define MSR_CORE_PERF_FIXED_CTR1 0x30a > #define MSR_CORE_PERF_FIXED_CTR2 0x30b > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index 097fe11..cb31338 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -79,6 +79,7 @@ static int lm_capable_kernel; > static bool has_msr_hv_hypercall; > static bool has_msr_hv_vapic; > static bool has_msr_hv_tsc; > +static bool has_msr_mtrr; > > static bool has_msr_architectural_pmu; > static uint32_t num_architectural_pmu_counters; > @@ -739,6 +740,10 @@ int kvm_arch_init_vcpu(CPUState *cs) > env->kvm_xsave_buf = qemu_memalign(4096, sizeof(struct kvm_xsave)); > } > > + if (env->features[FEAT_1_EDX] & CPUID_MTRR) { > + has_msr_mtrr = true; > + } > + Seems to match "MTRR Feature Identification" in my (older) copy of the SDM. > return 0; > } > > @@ -1183,7 +1188,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level) > CPUX86State *env = &cpu->env; > struct { > struct kvm_msrs info; > - struct kvm_msr_entry entries[100]; > + struct kvm_msr_entry entries[128]; > } msr_data; > struct kvm_msr_entry *msrs = msr_data.entries; > int n = 0, i; > @@ -1278,6 +1283,13 @@ static int kvm_put_msrs(X86CPU *cpu, int level) > kvm_msr_entry_set(&msrs[n++], HV_X64_MSR_REFERENCE_TSC, > env->msr_hv_tsc); > } > + if (has_msr_mtrr) { > + kvm_msr_entry_set(&msrs[n++], MSR_MTRRdefType, env->mtrr_deftype); > + for (i = 0; i < MSR_MTRRcap_VCNT; i++) { > + kvm_msr_entry_set(&msrs[n++], > + MSR_MTRRphysMask(i), env->mtrr_var[i].mask); > + } > + } > > /* Note: MSR_IA32_FEATURE_CONTROL is written separately, see > * kvm_put_msr_feature_control. */ > I think that this code is correct (and sufficient for the reset problem), but I'm uncertain if it's complete: (a) Shouldn't you put the matching PhysBase registers as well (for the variable range ones)? Plus, shouldn't you put mtrr_fixed[11] too (MSR_MTRRfix64K_00000, ...)? (b) You only modify kvm_put_msrs(). What about kvm_get_msrs()? I can see that you make the msr putting dependent on: /* * The following MSRs have side effects on the guest or are too * heavy for normal writeback. Limit them to reset or full state * updates. */ if (level >= KVM_PUT_RESET_STATE) { But that's probably not your reason for omitting matching new code from kvm_get_msrs(): "HV_X64_MSR_REFERENCE_TSC" is also heavy-weight (visible in your patch's context), but that one is nevertheless handled in kvm_get_msrs(). My only reason for (b) is simply symmetry. For example, commit 48a5f3bc added HV_X64_MSR_REFERENCE_TSC at once to both put() and get(). According to "target-i386/machine.c", mtrr_deftype and co. are even migrated (part of vmstate), so this asymmetry could become a problem in migration. Eg. source host doesn't fetch MTRR state from KVM, hence wire format carries garbage, but on the target you put (part of) that garbage (right now, just the mask) back into KVM: do_savevm() qemu_savevm_state() qemu_savevm_state_complete() cpu_synchronize_all_states() cpu_synchronize_state() kvm_cpu_synchronize_state() do_kvm_cpu_synchronize_state() kvm_arch_get_registers() kvm_get_msrs() do_loadvm() load_vmstate() qemu_loadvm_state() cpu_synchronize_all_post_init() cpu_synchronize_post_init() kvm_cpu_synchronize_post_init() kvm_arch_put_registers(..., KVM_PUT_FULL_STATE) kvm_put_msrs(..., KVM_PUT_FULL_STATE) /* state subset modified during VCPU reset */ #define KVM_PUT_RESET_STATE 2 /* full state set, modified during initialization or on vmload */ #define KVM_PUT_FULL_STATE 3 Hence I suspect (a) and (b) should be handled. ... And then we arrive at cross-version migration, where both source and target hosts support MTRR, but the source qemu sends unsynchronized MTRR data (ie. garbage) in the migration stream, but the target passes it to KVM. I don't know if this is possible, and if so, what to do about it. :( (BTW, VMSTATE_MTRR_VARS(env.mtrr_var, X86CPU, 8, 8), should be rebased to MSR_MTRRcap_VCNT too, probably.) Apologies about the verbiage, I just wrote down whatever crossed my mind. I don't think I said anything overly important, but I feel unsafe about giving my R-b until someone disproves my migration worries. (Basically, before the patch, whatever MTRR data was in the migration stream never reached KVM. This changes now.) ... Is the following argument valid in your opinion? KVM cares about guest-specified MTRR values *only* when kvm_arch_has_noncoherent_dma() returns true to vmx_get_mt_mask(). Since "kvm_arch_has_noncoherent_dma() returning true" (ie. device assignment) exludes migration anyway, we don't have to care about migration of MTRRs. I'm confused, but that shouldn't block this patch! Thanks, Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html