On 12/27/2011 08:51 AM, Boris Ostrovsky wrote: > On 12/26/11 08:53, Avi Kivity wrote: >> On 12/19/2011 07:46 PM, Boris Ostrovsky wrote: >>> From: Boris Ostrovsky<boris.ostrovsky@xxxxxxx> >>> >>> In some cases guests should not provide workarounds for errata even >>> when the >>> physical processor is affected. For example, because of erratum 400 >>> on family >>> 10h processors a Linux guest will read an MSR (resulting in VMEXIT) >>> before >>> going to idle in order to avoid getting stuck in a non-C0 state. >>> This is not >>> necessary: HLT and IO instructions are intercepted and therefore >>> there is no >>> reason for erratum 400 workaround in the guest. >>> >>> This patch allows us to present a guest with certain errata as fixed, >>> regardless of the state of actual hardware. >>> >>> >>> + >>> +static int svm_handle_osvw(struct kvm_vcpu *vcpu, >>> + uint32_t msr, uint64_t *val) >>> +{ >>> + struct kvm_cpuid_entry2 *cpuid_entry; >>> + >>> + /* Guest OSVW support */ >>> + cpuid_entry = kvm_find_cpuid_entry(vcpu, 0x80000001, 0); >>> + if (!cpuid_entry || !(cpuid_entry->ecx& bit(X86_FEATURE_OSVW))) >>> + return -1; >>> + >>> + /* >>> + * Guests should see errata 400 and 415 as fixed (assuming that >>> + * HLT and IO instructions are intercepted). >>> + */ >>> + if (msr == MSR_AMD64_OSVW_ID_LENGTH) >>> + *val = (osvw_len>= 3) ? (osvw_len) : 3; >>> + else { >>> + *val = osvw_status& ~(6ULL); >>> + >>> + if (osvw_len == 0&& boot_cpu_data.x86 == 0x10) >>> + /* Mark erratum 298 as present */ >>> + *val |= 1; >>> + } >>> + >>> + return 0; >>> +} >> >> Please move this to common code, to support cross-vendor migration. > > OK. (Note though that the OSVW registers are typically checked during > system boot so if you migrate a running guest it is unlikely that it > will read them again) It will, if it reboots. Of course that removes any consistency issues, but we like to be consistent even if there are no issues. >>> + if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) { >>> + uint64_t len, status; >>> + int err; >>> + >>> + len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH,&err); >>> + if (!err) >>> + status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, >>> + &err); >>> + >>> + spin_lock(&svm_lock); >>> + if (err) >>> + osvw_status = osvw_len = 0; >>> + else { >>> + if (len< osvw_len) >>> + osvw_len = len; >> >> This implies that if a bit is inside len, then the OS must apply the >> workaround? > > > Almost: when osvw bit is inside len then OS workaround is applied if > the bit is set. And if the bit is outside len then OS workaround > should always be applied. Ok. Note that hardware_enable() can be called as part of host cpu hotplug, so the values can change dynamically. Not really sure what to do in this case. > > >> >>> + osvw_status |= status; >>> + osvw_status&= (1ULL<< osvw_len) - 1; >>> + } >>> + spin_unlock(&svm_lock); >>> + } >>> + >>> svm_init_erratum_383(); >>> >>> @@ -3092,6 +3162,10 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, >>> unsigned ecx, u64 data) >>> case MSR_VM_IGNNE: >>> pr_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", >>> ecx, data); >>> break; >>> + case MSR_AMD64_OSVW_ID_LENGTH: >>> + case MSR_AMD64_OSVW_STATUS: >>> + /* Writes are ignored */ >>> + break; >>> default: >>> return kvm_set_msr_common(vcpu, ecx, data); >>> } >> >> Best to allow writes, the manual says writes are allowed for bios code, >> and the OS should just avoid it. > > BIOS is usually the one who sets these bits and OS indeed shouldn't > try to write the registers. > > The reason I decided to ignore the writes is for cases when we are on > a buggy BIOS and a guest writing through to the registers can alter > system state. They'd write to a virtual set of MSRs, not the real ones. > >> >> Not sure what to do here about live migration, since the guest will not >> adjust its behaviour. Should management software read those MSRs from >> userspace and check that they're consistent across a cluster? >> > > Either that or start the guest on the most "broken" processor > (assuming that OSVW registers are only read during boot). > > Alternatively the management SW would set emulated values of the MSRs > for all systems in the cluster but I am not sure whether this is > supported yet. If you allow writing, then host userspace can coordinate across the cluster and use KVM_SET_MSR to set up the most strict values. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html